Friday, April 12, 2013

How to get duplicate records using PROC SORT?

I always read carefully "What's New" when SAS release any new product. It is fun when I find great tip to improve my SAS knowledge. In SAS 9.3, I think we should pay attention to PROC SORT.

Before 9.3, if we want to get all duplicate records in one step, we have to use PROC SQL. I admit that PROC SQL is a very great tool. However, It is not born for SAS dataset and may not have best performance when processing "HUGE" SAS dataset. With the two new NOUNIQUEKEY and UNIQUEOUT= options in SAS 9.3, PROC SORT has great improvement on handing duplicate records.

Below are the samples:
* Sample data;
data a;
 do i=1 to 3;
  output;
  output;
  output;
 end;

 i=5;
 output;
run;

* Note: UNI1 has all non-duplicate records while
 DUP1 has all duplicate records with same keys;
proc sort data=a out=dup1 nouniquekey uniout=uni1;
 by i;
run;

* Note: UNI2 will have 1st duplicate record while 
 DUP2 has all Other duplicate records with same keys;
proc sort data=a out=uni2 nodupkey dupout=dup2;
 by i;
run;

* Before 9.3;
proc sql noprint;
 create table dup3 as
  select *
  from a
  where i in 
  (
   select i from a
    group by i
   having count(i) > 1
  );
quit;

No comments: