Monday, June 22, 2009

ID functionality in PROC MEANS

In real world, there are two kinds of data: numerical and categorical.
Regardingly, in SAS world there are two basic tools: PROC MEANS and PROC FREQ.

PROC MEANS is commonly used to summarize numerical data.
Additionaly, it can provide ID functionality as follows:

1. ID statement + IDMIN and PRINTIDVARS options in PROC statement
2. MAXID and MINID options in OUTPUT statement
3. IDGROUP option in OUTPUT statement

All work only for output dataset except PRINTIDVARS, which is for the printed output.
As far as output dataset is concerned, the ID functionality scope is #1 < #2 < #3.
That means IDGROUP is most powerful, like TopN/BottomN, obs, last etc.

Correspondingly, IDGROUP is most complex.

Sunday, June 21, 2009

How to position the macro problem at runtime

Macro programming is error-prone. It is easy to debug the Macro compile error.
However, it is not intuitive to debug Macro runtime error since we can not get correct postion information from SAS log.

For example:

%macro test;
data a;
a=1;
b="1";
if a=b then put "Here!";
run;
%mend;

%test

SAS log:
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
1:51
Although SAS log give out the postion "1:51", we can not trace the issue in Macro.

Here is one key to the issue. We can save the Macro output as SAS program, position the issue in SAS program and track back to Macro.
Please see the sample code at below:


filename mprint temp;
options mprint mfile;

%test

%include mprint / source2;
options nomprint nomfile;
filename mprint clear;

CSV with newline

In CSV, fields with embedded newline must be enclosed within double-quote characters.
However, PROC CIMPORT fail to import this kind of CSV.

To conform to the input standard, we can convert the embedded newline into " \par " (see RTF specification), import CSV in SAS dataset using PROC CIMPORT and then convert " \par " back to newline.