Thursday, October 11, 2012

convert character to numeric (date time) and vice versa


To convert character to number, we use the input function. To convert number to character, we use the put function. At the beginning, we always get confuse which one is which. We can use the following way to help us remember:
  When we write something on paper, everything is character. So if we try to convert something to character, we "put" the information in paper. If we try to get a number from paper (character), we "input"(read) the character to our brain (number). 

char to num, and num to char example: 

data eb;
 chra="321.32";
 chrb="5.2E6";
 num=235;
 chra2num=input(chra,best32.);
 chrb2num=input(chrb, best32.);
 num2chr=compress(put(num,best32.),' ');
run;
The best32. format in the above code is just an example, you can specific other formats.  

char to date, and date to char example:

data eb2;
input date_char $10. mon day year;/* convert character to date */
date1=input(compress(date_char), anydtdte.);
date2=mdy(mon,day,year);
format date1 mmddyy10. date2 yymmdd10.;
cards;
11/ 1/1999 11 01 1999
 1Sep1999  1 11 1999
12/24/2003 12 24 2003 
 7/ 1/2001  7  1 2001
;;
run;

If you check the result data set eb2, you will find that the third line (line of 12/24/2003) has a missing value. 
The reason for the missing value is that the "anydtdate" format by default is equivalent "anydtdte9.", 9 digits. To correct this we use "anydtdte10." and now the third line convert to the right date.


Thursday, September 13, 2012

calculate percentile in SAS with weighted data

How to calculate the 12.5 percentile? The answer is to use PROC UNIVARIATE procedure. Doesn't the univariate procedure only include the 1st, 5th, 10th, 25th 50th 75th 90th 95th, 99th and 100th percentile? Yes, if you read the default output only. The output statement will have the univeriate procedure output customized percentiles.

Let's first create a sample data eb

data eb;
input n1 n2;
cards;
2   20
4   35
5   40
8   55
10  60
13  75
;;
run;

Example 1: calculate percentiles not automatically in the procedure. 


proc univariate data=eb noprint;
 var n1;
 output out=eb1 pctlpre=p pctlpts=10 to 20 by 3;
 run;
 proc print eb1;
 run;


    Obs    p10    p13    p16    p19
    1      2      2      2      4  


Example 2: calculate customs percentiles for multiple variables. At this time we need to specify the prefix for each variable, pctlpre=pa_ pb_. 

proc univariate data=eb noprint;
 var n1 n2;
 output out=eb2 pctlpre=n1p n2p pctlpts=12.5 20 50;
 run;
proc print data=eb2;
run;


 Obs    pa_12_5    pa_20    pa_50    pb_12_5    pb_20    pb_50
 1        2         4       6.5        20        35      47.5 



If you forget to put the pctlpre term you will get a warning message " ERROR: The PCTLPRE= option must be specified to generate additional percentiles." If you only specify pa_ in the option but no pb_, you will not get any error message, the output data set will only include the percentiles for the first variable. In this example is the "2, 4 and 6.5". 


Example 3: what if the data is a weighted data? we use the weight statement in the procedure. The weight does not need to be a interger, decimal values are allowed. We add a decimal weight wt in the data.



data eb;
input n1 n2 wt;
cards;
2   20  5.4  
4   35  8.6  
5   40  10.9 
8   55  15.5 
10  60  20.3 
13  75  25.8 
;;
run;
proc univariate data=eb noprint;
 var n1 n2;
 output out=eb1 pctlpre=pa_ pb_ pctlpts=10 20 30;
 run;
proc print data=eb1;
run;
proc univariate data=eb noprint;
weight wt;
var n1 n2;
 output out=eb1 pctlpre=pa_ pb_ pctlpts=10 20 30;
 run;
 proc print data= eb1;
 run;

The result below shows the percentiles by using wt(2nd row) compare to not using weight (1st row)

  Obs    pa_10    pa_20    pa_30    pb_10    pb_20    pb_30
  1       2        4        4        20       35       35  
  1       4        5        8        35       40       55