Friday, October 4, 2013

Tip on Hash declaration

If you are familiar with hash programming, you must be familiar with the standard hash code template at below:
length c1 c2 $ 20;
if _n_=1 then do;
    declare hash h (dataset:"dsn");
    rc = h.definekey('n1', 'c1');
    rc = h.definedata('c2');
    rc = definedone();
    call missing(n1, c1, c2);
end;

Below is the SAS explanation: The hash object does not assign values to key variables (for example, h.find(key:'abc')), and the SAS compiler cannot detect the data variable assignments that are performed by the hash object and the hash iterator. Therefore, if no assignment to a key or data variable appears in the program, SAS issues a note stating that the variable is uninitialized.
So, we can have simpler and robust codes:
if _n_=1 then do;
    if 0 then set dsn (keep=n1 c1 c2); /* copy variable metadata from original dataset */
    declare hash h (dataset:"dsn");
    rc = h.definekey('n1', 'c1');
    rc = h.definedata('c2');
    rc = definedone();
end;

Friday, April 12, 2013

How to get duplicate records using PROC SORT?

I always read carefully "What's New" when SAS release any new product. It is fun when I find great tip to improve my SAS knowledge. In SAS 9.3, I think we should pay attention to PROC SORT.

Before 9.3, if we want to get all duplicate records in one step, we have to use PROC SQL. I admit that PROC SQL is a very great tool. However, It is not born for SAS dataset and may not have best performance when processing "HUGE" SAS dataset. With the two new NOUNIQUEKEY and UNIQUEOUT= options in SAS 9.3, PROC SORT has great improvement on handing duplicate records.

Below are the samples:
* Sample data;
data a;
 do i=1 to 3;
  output;
  output;
  output;
 end;

 i=5;
 output;
run;

* Note: UNI1 has all non-duplicate records while
 DUP1 has all duplicate records with same keys;
proc sort data=a out=dup1 nouniquekey uniout=uni1;
 by i;
run;

* Note: UNI2 will have 1st duplicate record while 
 DUP2 has all Other duplicate records with same keys;
proc sort data=a out=uni2 nodupkey dupout=dup2;
 by i;
run;

* Before 9.3;
proc sql noprint;
 create table dup3 as
  select *
  from a
  where i in 
  (
   select i from a
    group by i
   having count(i) > 1
  );
quit;

Thursday, April 11, 2013

Calculating leap year

SAS support give out one way to identify leap year:
Sample 44233: Calculating leap year using PROC FCMP and user-defined function

Below is another simple method:
/* check if Febuary has 28 or 29 days in the year */
data _null_;
 do year = 1997 to 2005;
  days = datdif(mdy(2,1,year), mdy(3,1,year), 'act/act');

  if days=29 then leap='Yes';
  else leap='No';

  put year= leap=;
 end;
run;