Before 9.3, if we want to get all duplicate records in one step, we have to use PROC SQL. I admit that PROC SQL is a very great tool. However, It is not born for SAS dataset and may not have best performance when processing "HUGE" SAS dataset. With the two new NOUNIQUEKEY and UNIQUEOUT= options in SAS 9.3, PROC SORT has great improvement on handing duplicate records.
Below are the samples:
* Sample data; data a; do i=1 to 3; output; output; output; end; i=5; output; run; * Note: UNI1 has all non-duplicate records while DUP1 has all duplicate records with same keys; proc sort data=a out=dup1 nouniquekey uniout=uni1; by i; run; * Note: UNI2 will have 1st duplicate record while DUP2 has all Other duplicate records with same keys; proc sort data=a out=uni2 nodupkey dupout=dup2; by i; run; * Before 9.3; proc sql noprint; create table dup3 as select * from a where i in ( select i from a group by i having count(i) > 1 ); quit;
No comments:
Post a Comment