Monday, September 15, 2014

How to read CSV using PROC IMPORT

I believe you have read a lot paper for how to read CSV file. Here I want to add some helpful tips:
#1.
We can control the TERMSTR, ENCODING and LRECL using fileref, which may affect how PROC IMPORT handle the file.
For example, we can read UNIX format file in Win SAS.
filename in "&incsvfile" termstr=lf encoding=utf8 lrecl=32767;
proc cimport in=in out=work.test dbms=csv; 
    getnames = yes;
    datarow = 2;
    guessingrows = 2147483647;
run;

#2.
CSV file may not be read correctly when null value is using two double-quote. To fix this, please set the macro variable EFI_NOQUOTED_DELIMITER as follows:
%let EFI_NOQUOTED_DELIMITER = yes;

#3.
I always add the option GUESSINGROWS with maximum value to make sure all data are read. The option does not exist when I blog first CSV tip on 2009. :)
guessingrows = 2147483647;

Thursday, September 11, 2014

Table lookup using FCMP hashing

There are many way to look up table: SQL join, Data hashing, Format, and etc.
Starting from SAS 9.3, we can use hashing technique in PROC FCMP. In this way, we can lookup table using function. Below is the sample code:
* Scenario: Replace PROD_HIER_SK with PROD_ID.;
proc fcmp outlib=work.funcs.prod;
 function prod_id(in_prod_hier_sk) $ 200;
  length prod_id $ 200;

  /* executable for only once */
  declare hash h(dataset:"di_dm.product_dm");
  rc = h.definekey("prod_hier_sk");
  rc = h.definedata("prod_id");
  rc = h.definedone();

  /* look up first and return if found */
  call missing(prod_id);
  prod_hier_sk = in_prod_hier_sk;
  rc = h.find();

  return(prod_id);
 endsub;
run;

options cmplib=(work.funcs);

data sample_fcmp_hashing (keep=prod_hier_sk prod_id parent_prod_hier_sk parent_prod_id);
 set di_dm.product_hier_assoc_dm;

 prod_id = prod_id(prod_hier_sk);
 parent_prod_id = prod_id(parent_prod_hier_sk);
run;


Reference:
Andrew Henrick, Donald Erdman, and Stacey Christian. 2013. “Hashing in PROC FCMP to Enhance Your Productivity”, Proceedings of the SAS Global Forum 2013 Conference
Sample 47224: Load a SAS data set into a Hash Object using PROC FCMP

BTW, interestingly, you can find more information from SAS paper than from SAS Help doc.
As an experienced SAS programmer, I always keep an eye on SAS Global Forum.

System options to introduce code

I don't want to change any existing code in the complicated SAS environment. Because the codes may be there for years and are working well, you have to explain or communicate a lot to make one little harmless change. It is smart to insert the code using system options when you kick off SAS batch.
#1. -append/-insert the program in autoexec system option
$ sas_batch.sh sas_program.sas -append autoexec sas_program_full_path

#2. -initstmt/-termstmt sytem option
$ sas_batch.sh sas_program.sas -initstmt '*sas codes here;'

Scenario:
All utility macros are stored and reused using macro autocall facility, when you want to test new macro (e.g. macro_new.sas), you can run the same work with new macro as follows:
$ sas_batch.sh sas_program.sas -append autoexec macro_new.sas
$ sas_batch.sh sas_program.sas -initstmt '%inc "macro_new.sas";'
In this way, we can test the new macro without change any existing codes.