Monday, December 5, 2016

many-to-many SQL join using DATA step

As we know, Data step and SQL will get different results as to many-to-many join. SQL join is powerful but may be time-consuming when processing huge data, e.g. billion records. Here is one simple tip to implement SQL join using DATA step. If we take advantage of sorted datasets, it will get very good performance benefit.
data a;
do i=1,1,2,2,3,3;
    a=1;
    output;
end;
run;

data b;
    do i=1,1,1,2,2,2;
        b = 0;
        output;
    end;
run;

/* SQL step */
proc sql;
    create table c_sql as
    select a.i, a.a, b.b
    from a, b
    where a.i = b.i;
quit;

/* Data step - which can match SQL result */
data c_data;
    set a;

    drop i_b;

    do ai=1 to n;
        set b(rename=(i=i_b)) nobs=n point=ai;
        if i=i_b then do; 
            output;
        end;
    end;
run;

Tuesday, August 30, 2016

XLS export

With ODS EXCELXP or ODS EXCEL, you may get some fancy looking as you can add style or format. However, I prefer the native Excel output without any decoration. Please note that we can write only one dataset at one time.

Below is the sample code:
libname xlsout xlsx '.\out.xls';

data xlsout.a;
    set sashelp.class;
run;

libname xlsout clear;

Tuesday, July 26, 2016

Does keywords _NUMERIC_/_CHARACTER_ represent all numeric/character variables?

Array is my favorite tool. With reserved keywords _NUMERIC_/_CHARACTER_, it can handle dynamic variables list. However, they may not represent all numeric/character variables. When it is in different position of DATA step, it will have different variables list. The codes below shows that array with _NUMERIC_/_CHARACTER_ represents only pre-defined variables. The post-defined variables will not be included anyway.
data _null_;
    array num0{*} _numeric_;

    putlog "Initial numeric array:";
    do i=1 to dim(num0);
        putlog i= num0[i]=;
    end;

    a = 1;
    b= 2;

    array num1{*} _numeric_;
    putlog "First numeric array:";
    do i=1 to dim(num1);
        putlog i= num1[i]=;
    end;

    c = 3;

    array num2{*} _numeric_;
    putlog "Second numeric array:";
    do i=1 to dim(num2);
        putlog i= num2[i]=;
    end;
run;

Result:
Initial numeric array:
First numeric array:
i=1 i=1
i=2 a=1
i=3 b=2
Second numeric array:
i=1 i=1
i=2 a=1
i=3 b=2
i=4 c=3

Monday, May 2, 2016

Oracle DDL export from SAS - get_ddl truncation issue

When we grab the complete DDL from Oracle, please increase DBMAX_TEXT value to avoid the truncation issue.
proc sql;
 connect to ORACLE 
        (DBMAX_TEXT=5000 READBUFF=32767 user=xxx password=xxx);

    create table ddl as
    select txt 
    from connection to ORACLE 
    (
        select dbms_metadata.get_ddl('TABLE', table_name) txt
        from user_tables
    );

    disconnect from ORACLE;
quit;

Wednesday, February 10, 2016

$w. vs. $CHARw.

$w. is the default informat for character variable. It will convert single period to a blank automatically. To fix it, we can use $CHARW. alternatively.

Below are the difference in SAS doc:
The $w. informat is almost identical to the $CHARw. informat. However, $CHARw. does not trim leading blanks nor does it convert a single period in an input field to a blank, while $w. does both.
data num;
   infile datalines dlmstr="|" missover;

   length x $10;
   attrib y length=$10 informat=$char10.;
   length z $10;

   input x y z;
   put x= y= z=; * y is period while z is blank;
   datalines;
1|.|.
;