Wednesday, December 24, 2008

Base SAS Tip Sheets on support.sas.com

Very useful reference sheet.
Hope SAS provide more tip sheets. :)

For more information, click here

Thursday, December 18, 2008

Replacing a string in a file

Generally, there are two types to replace string in a file.
Classic type: infile / file
C type: functions for external files (fopen, fread, fwrite etc)

Here I want to give two points:
#1:
Classic type can not do in-place update. That means we have to creat a new file.
Note that with SHAREBUFFERS option in infile statement, we can also do in-place update.

#2:
Although C type can do in-place update, string to write can NOT be longer than the string to read.

The two examples at below are helpful.
Classic type:
http://www.sconsig.com/sastips/tip00327.htm
C type:
http://support.sas.com/kb/32/868.html

Friday, December 12, 2008

USER system option for batch mode debug

We run SAS program in two mode: batch mode and interactive mode (DMS/NODMS).

With batch mode, I can automate and schedule the SAS job. This can improve the work efficiency dramatically.
However, when SAS job fail, it will take more time to solve the issue.

Below are the steps:
1. With system option USER, run the file and save WORK datasets in permanent libary.
2. Check the error point in SAS log.
3. Start from error point, debug the code step by step.

Thursday, December 11, 2008

Interval: Key concept to INTNX and INTCX

When handling dates and times, we must know two powerful functions: INTNK and INTCK.
The two functions are complementary.
INTNK is for new date value incremented by a specified number of intervals, while INTCK for the number of intervals between two dates.

It is hard for a fresh SAS programmer to understand the two functions because they are not intuitive.
However, as a qualified SAS programmer, it should be a must.

Actually, the key concept behind the two functions is INTERVAL.
If you get it, anything will be simple.

INTNX Demo:
INTNX("year", "31Dec2000"d, 2);

As to "Year" interval, "31Dec2000"d is in the interval between "01Jan2000"d and "31Dec2000"d.
So the new interval should be calculated as follow:
"01Jan2000"d and "31Dec2000"d
+
2
= "01Jan2002"d and "31Dec2002"d.
In the new interval, INTNX return date ‘01Jan2002’d. (Note: the default alignment is beginning).

INTCK Demo:
INTCK("year","31Dec2000"d, "01Jan2001"d);
As to ‘Year’ interval, '31Dec2000'd is in the interval between "01Jan2000"d and "31Dec2000"d’,
While "01Jan2001"d in the interval between "01Jan2001"d and "31Dec2001"d

The number of interval is:
"01Jan2001"d and "31Dec2001"d
-
"01Jan2000"d and "31Dec2000"d’
= 1

So INTCX return number 1.


One more thing:
SAS always focus on improvement of "The Power to DO". (I have modified the logo. Wow.)
With new introduction of alignment value ‘sameday’ in INTNK in SAS 9, many old solutions will step off the stage of history.

Tuesday, December 9, 2008

DO NOT use IFN in division by Zero

IFN is a compact format for IF/THEN/ELSE. However, it runs with a different manner.

In IFN, all expressions will be evaluated at first. And then logical judgement will be applied and regarding value be returned, while IF/THEN/ELSE need not evaluate all expressions.

data test1;
input a b ;
l=ifn(b>0 , 0 , a/b );
* 'b>0' and 'a/b' will be evaluated, so error occur;
cards;
10 2
6 0
9 3
;

data test2;
input a b;
if b=0 then l=0;
else l=a/b;
cards;
10 2
6 0
9 3
;

How to sort dataset with format?

Every SAS programmer must be familiar with Proc SORT. Every time we merge multiple datasets, do group-by processing, etc. we have to sort the dataset by key variables.

In many procs, Format statement can function partly as group-by processing, e.g. Proc MEANS, Proc REPORT etc.

However, different from the procs above, Format statement dose not do group-by processing in Proc SORT.
The Format statement in Proc SORT ONLY assign the format to the variable in output dataset.

Generally, we create a new variable based on the format and then sort the new variable using Proc SORT.
With Proc SQL, we can do the same operations in one step. :)

Sample code:

proc format;
value $sex
'M' = '1'
'F' = '2';
run;

* Format statement does not work in Proc SORT;
proc sort data=sashelp.class out=sortfail;
format sex $sex.;
by sex;
run;

* With Proc SQL, we can sort the dataset with format
in one step;
proc sql;
create table sortok as
select * from sashelp.class
order by put(sex, $sex.);
quit;

* More steps with "by vars NOTSORTED";

Wednesday, December 3, 2008

Execute code on the fly

Do you want to execute code on the fly?

There are two methods:
1. call execute routine
2. temp file + %include

filename test temp;
data _null_;
filename test;
**** put code here ****;
run;
%include test;

The latter method is more popular and friendly.

Statistical Graphics procedure

In the past, we always plot statistical graph with steps at belows:
1. Run SAS/STAT Procs
2. Save ODS output to dataset
3. Plot the graph by using SAS/Graph and annotate.

The old method is time consuming and error-prone.

SAS/Graph 9.2 has made a great improvement.
With the new Statistical Graphics Procedure, we can do the job easily.

Refer to
http://support.sas.com/documentation/cdl/en/grstatproc/60786/HTML/default/intro.htm for more information.

Tuesday, December 2, 2008

Useful informat modifiers

The world is not perfect. Likewise, raw data is not always clean.
With SAS powerful input facility, we can handle different format data.

There are four input styles: column, list, formatted and named.
In my opinion, the most powerful and interesting style is formatted.

Here are 4 useful modifiers, which can be used in formatted input.
#1: colon format modifier (:)
character data that contains more than eight characters
numeric data that contains special characters.
#2: ampersand format modifier (&)
character data that contains single embedded blanks.
Note: The only restriction is that at least two blanks must divide each value from the next data value in the record.
#3: double question mark modifier (??)
Suppress the error messages resulting from invalid data/argument. This can be used in INPUT function and INPUT statement. The modifier also resets the automatic variable _ERROR_ to 0.
#4: single question mark modifier (?)
Suppress the error messages resulting from invalid data/argument. This can be used in INPUT function and INPUT statement.

Compared with informat modifier, there are format modifier for PUT function at below.
-L left aligns the value.
-C centers the value.
-R right aligns the value.

Obviously, format modifier is not much useful.

orientation swith between pages for ODS RTF

In SAS forum, Cynthia@sas give a trick on how to do orientation switch.
I copied the code at below:

It should be noted that this trick is specific to ODS RTF.

options orientation=portrait;
ods rtf file='twoway.rtf';

** first procedure;

options orientation=landscape;
ods rtf;

** second procedure;

ods rtf close;

From: http://support.sas.com/forums/thread.jspa?messageID=15691&tstart=0#15691

What a pity: there are no COMPLETETYPES, PRELOADFMT and MLF options for Proc FREQ

The most widely used Procs for simple statistics are Proc Freq and Proc Means. One is for categorical variable while the other for numerical variable.

Although Proc Means can do some part work of Proc Freq, it can not replace Proc Freq completely.

When the 3 options (COMPLETETYPES, PRELOADFMT and MLF) are combined with user-defined format, Procs are SO powerful in tabulate output. Definitely, it will simplify many post-processing.

The three options are only partly specific to Proc Means, Proc Report and Proc Tabulate.
And, they are not available to Proc Freq. That means we have to play a trick when counting frequency of categorical variable by using Proc Freq.

What a pity!