Tuesday, May 26, 2009

Dataset File-Handling statements - SET/MERGE/UPDATE/MODIFY

Totally there are 4 dataset file-handling statements.
All can support the fomula as follows:

DATA DS;
SET DS1 DS2; *SET can be replaced with MERGE/UPDATE/MODIFY;
BY VAR;
RUN;

However, they have different meaning.

SET statement: interleave two or more SAS datasets. We can get an ordered dataset without PROC SORT.
Merge statement: match-merging, which is very popular.
Update statement: update Master dataset with Transaction one, which is the only formula. Before using it, please read carefully the requirements in manual.
Modify statement: update Master dataset in place using matching access method. Although it is a powerful tool, many programmer would like to bypass it using very complicated code.

Sunday, May 24, 2009

Truncate issue when importing CSV file

CSV file is an old common format of information distribution. However, when we read CSV into SAS dataset using PROC IMPORT, the string is truncated sometimes.
The reason is that PROC IMPORT scan only 20 records by default to determine variable attributes. SAS Notes has detailed the steps to solve the issue.

For more information, please see http://support.sas.com/kb/1/075.html.

*Sample code to read CSV;
PROC IMPORT OUT= WORK.DATA
DATAFILE= "csv.txt"
DBMS=DLM REPLACE;
DELIMITER='2C'x;
GETNAMES=YES;
DATAROW=2;
RUN;

Sunday, May 17, 2009

How to avoid "retain" observations in Match-Merging

There are two merging techniques: One-to-One Merging and Match-Merging.
Match-Merging is the most popular operation in SAS DATA step programming.
The reason is that Match-Merging always “retain” last observation until new one is read into PDV.
(Note that One-to-One merging is not.)

However, sometimes we need not retain last observation.
Since SAS do not provide option to control it, we have to initialize PDV manually every time.

Here is the code:

OUTPUT; * save PDV to dataset;
ARRAY CHAR _CHARACTER_ ;
ARRAY NUM _NUMERIC_ ;
DO OVER CHAR ; CHAR = '' ; END ;
DO OVER NUM ; NUM = . ; END ;


or

*For SAS 9.1 and later;
OUTPUT;
CALL MISSING(OF _ALL_);

Data Set List - A close friend to Set/Merge statement

Are you boring with writing a long string to concatenate/merge many datasets? With the introduction of Data Set Lists in SAS 9.2, it is “gone with the wind”.

Data set lists can give out the list of existing datasets efficiently and flexibly.
Just like variable list, it also has two formulas:
Name prefix lists: SALE1:
Numbered range lists: sales1-sales4