Friday, April 12, 2013

How to get duplicate records using PROC SORT?

I always read carefully "What's New" when SAS release any new product. It is fun when I find great tip to improve my SAS knowledge. In SAS 9.3, I think we should pay attention to PROC SORT.

Before 9.3, if we want to get all duplicate records in one step, we have to use PROC SQL. I admit that PROC SQL is a very great tool. However, It is not born for SAS dataset and may not have best performance when processing "HUGE" SAS dataset. With the two new NOUNIQUEKEY and UNIQUEOUT= options in SAS 9.3, PROC SORT has great improvement on handing duplicate records.

Below are the samples:
* Sample data;
data a;
 do i=1 to 3;
  output;
  output;
  output;
 end;

 i=5;
 output;
run;

* Note: UNI1 has all non-duplicate records while
 DUP1 has all duplicate records with same keys;
proc sort data=a out=dup1 nouniquekey uniout=uni1;
 by i;
run;

* Note: UNI2 will have 1st duplicate record while 
 DUP2 has all Other duplicate records with same keys;
proc sort data=a out=uni2 nodupkey dupout=dup2;
 by i;
run;

* Before 9.3;
proc sql noprint;
 create table dup3 as
  select *
  from a
  where i in 
  (
   select i from a
    group by i
   having count(i) > 1
  );
quit;

Thursday, April 11, 2013

Calculating leap year

SAS support give out one way to identify leap year:
Sample 44233: Calculating leap year using PROC FCMP and user-defined function

Below is another simple method:
/* check if Febuary has 28 or 29 days in the year */
data _null_;
 do year = 1997 to 2005;
  days = datdif(mdy(2,1,year), mdy(3,1,year), 'act/act');

  if days=29 then leap='Yes';
  else leap='No';

  put year= leap=;
 end;
run;

Monday, November 26, 2012

to gzip output using FIFO

Generally SAS programmer make an output file and then run "gzip" command using "X", "CALL SYSTEM" or "%SYSEXEC". Actually, we can gzip output on the fly.
See below:
x "mknod pipe&sysjobid p";

filename cmd pipe "gzip < pipe&sysjobid 
> sashelp.DAT.gz &";

data _null_;
 infile cmd;
 stop;
run;

data _null_;
 file "pipe&sysjobid";

 retain header;
 if missing(header) then do;
  put "NAME|SEX";
  header = 1;
 end;

 set sashelp.class;
 put name +(-1) '|' sex;
run;
filename cmd clear;

x "rm pipe&sysjobid";

Friday, February 25, 2011

Search at one click

I always search something as follows:
1. go to http://support.sas.com/
2. input keyword in blank search box, e.g. proc format
3. select Documentation in "Search support.sas.com"
4. Click search button

It becomes boring when I always do the same steps.

Here is a tip:
1. install CleverKeys for Windows at http://www.cleverkeys.com/ck.html?p=home&os=windows
2. Right click CleverKeys -> Preference -> Web links -> Add new link at below
Description: SAS Document
URL before: http://support.sas.com/dsearch?Find=Search&ct=5210&qt=
URL after: &col=suppprd&nh=10&qp=&qc=suppsas&ws=1&qm=1&st=1&lk=1&rf=0&oq=&rq=0

Of course, you can add any other useful search engines.
My favorite links are Wikipedia, Google and SAS Document.

Now you can search the SAS at one click. :)

Wednesday, July 14, 2010

Delete observations when all variabls are missing

It is a common question.

With the new function CMISS in 9.2, we can perform the task in ONLY one DATA step.
Please see the sample code at below:

data raw;
output; * obs with missing values;
a=1; b='2'; c=.;
output;
run;

data output;
set raw;
array aa {*} $ _character_;
array nn {*} _numeric_;

if cmiss(of _all_) = (dim(aa)+dim(nn)) then delete;
run;

Monday, July 5, 2010

Quick tip: %EVAL

Essentially, the value of all macro variables is character string.

Sometimes, we do need that macro variable has only numeric value.
We can use %EVAL to judge if the string is numeric value.


* Sample;
%let mv=1+a;

%let a=%eval(%scan(&mv, 1, +));
%let b=%eval(%scan(&mv, 2, +)); * Error occurs;

Thursday, June 3, 2010

How to identify the header of ACROSS columns in PROC Report

When one across item is used with >1 analysis items, the header text of the across item will affect all. To use different header text, we have to specify an alias for the across item.
Below are the sample codes:

* Default;
proc report data=sashelp.class nowd;
column sex,height sex,weight;
define sex / across;
define height / analysis mean;
define weight / analysis mean;
run;

* To use alias for the across item;
proc report data=sashelp.class nowd;
column sex,height (sex=sex_alias),weight;
define sex / across "Height";
define sex_alias / across "Weight";
define height / analysis mean " ";
define weight / analysis mean " ";
run;