Tuesday, November 4, 2014

Subset and Hash dimension in run time

In data warehousing, the dimension may be very big, e.g. PRODUCT. To hash the big dimension, we have to increase memory or decrease dimension size.
Starting from SAS 9.3, we can subset and hash big dimension at run time.
Of course it will affect performance if the hash load happens too often. However, the process is straightforward and easy to maintain.
Below is the example:
data big_dimension;
    do geo=1,2,3;
        prod="test";
        output;
    end;
run;

data fact;
    do geo=1,2,3,4,5;
    output;
    end;
run;

data fact_new;
    set fact;
    by geo;

    drop string rc;

    if 0 then set big_dimension (keep=geo prod);

    if first.geo then do;
        string = cats("data dim_subset / view=dim_subset; set big_dimension; where geo=",geo,";run;");
        rc =dosubl(string);
        declare hash h (dataset:"dim_subset");
        rc = h.definekey("geo");
        rc = h.definedata("prod");
        rc = h.definedone();
    end;

    call missing(prod);
    rc = h.find();

    if last.geo then do;
    end;
run;

No comments: