Thursday, September 13, 2012

calculate percentile in SAS with weighted data

How to calculate the 12.5 percentile? The answer is to use PROC UNIVARIATE procedure. Doesn't the univariate procedure only include the 1st, 5th, 10th, 25th 50th 75th 90th 95th, 99th and 100th percentile? Yes, if you read the default output only. The output statement will have the univeriate procedure output customized percentiles.

Let's first create a sample data eb

data eb;
input n1 n2;
cards;
2   20
4   35
5   40
8   55
10  60
13  75
;;
run;

Example 1: calculate percentiles not automatically in the procedure. 


proc univariate data=eb noprint;
 var n1;
 output out=eb1 pctlpre=p pctlpts=10 to 20 by 3;
 run;
 proc print eb1;
 run;


    Obs    p10    p13    p16    p19
    1      2      2      2      4  


Example 2: calculate customs percentiles for multiple variables. At this time we need to specify the prefix for each variable, pctlpre=pa_ pb_. 

proc univariate data=eb noprint;
 var n1 n2;
 output out=eb2 pctlpre=n1p n2p pctlpts=12.5 20 50;
 run;
proc print data=eb2;
run;


 Obs    pa_12_5    pa_20    pa_50    pb_12_5    pb_20    pb_50
 1        2         4       6.5        20        35      47.5 



If you forget to put the pctlpre term you will get a warning message " ERROR: The PCTLPRE= option must be specified to generate additional percentiles." If you only specify pa_ in the option but no pb_, you will not get any error message, the output data set will only include the percentiles for the first variable. In this example is the "2, 4 and 6.5". 


Example 3: what if the data is a weighted data? we use the weight statement in the procedure. The weight does not need to be a interger, decimal values are allowed. We add a decimal weight wt in the data.



data eb;
input n1 n2 wt;
cards;
2   20  5.4  
4   35  8.6  
5   40  10.9 
8   55  15.5 
10  60  20.3 
13  75  25.8 
;;
run;
proc univariate data=eb noprint;
 var n1 n2;
 output out=eb1 pctlpre=pa_ pb_ pctlpts=10 20 30;
 run;
proc print data=eb1;
run;
proc univariate data=eb noprint;
weight wt;
var n1 n2;
 output out=eb1 pctlpre=pa_ pb_ pctlpts=10 20 30;
 run;
 proc print data= eb1;
 run;

The result below shows the percentiles by using wt(2nd row) compare to not using weight (1st row)

  Obs    pa_10    pa_20    pa_30    pb_10    pb_20    pb_30
  1       2        4        4        20       35       35  
  1       4        5        8        35       40       55  






No comments:

Post a Comment