Friday, April 12, 2013

How to get duplicate records using PROC SORT?

I always read carefully "What's New" when SAS release any new product. It is fun when I find great tip to improve my SAS knowledge. In SAS 9.3, I think we should pay attention to PROC SORT.

Before 9.3, if we want to get all duplicate records in one step, we have to use PROC SQL. I admit that PROC SQL is a very great tool. However, It is not born for SAS dataset and may not have best performance when processing "HUGE" SAS dataset. With the two new NOUNIQUEKEY and UNIQUEOUT= options in SAS 9.3, PROC SORT has great improvement on handing duplicate records.

Below are the samples:
* Sample data;
data a;
 do i=1 to 3;
  output;
  output;
  output;
 end;

 i=5;
 output;
run;

* Note: UNI1 has all non-duplicate records while
 DUP1 has all duplicate records with same keys;
proc sort data=a out=dup1 nouniquekey uniout=uni1;
 by i;
run;

* Note: UNI2 will have 1st duplicate record while 
 DUP2 has all Other duplicate records with same keys;
proc sort data=a out=uni2 nodupkey dupout=dup2;
 by i;
run;

* Before 9.3;
proc sql noprint;
 create table dup3 as
  select *
  from a
  where i in 
  (
   select i from a
    group by i
   having count(i) > 1
  );
quit;

Thursday, April 11, 2013

Calculating leap year

SAS support give out one way to identify leap year:
Sample 44233: Calculating leap year using PROC FCMP and user-defined function

Below is another simple method:
/* check if Febuary has 28 or 29 days in the year */
data _null_;
 do year = 1997 to 2005;
  days = datdif(mdy(2,1,year), mdy(3,1,year), 'act/act');

  if days=29 then leap='Yes';
  else leap='No';

  put year= leap=;
 end;
run;