Saturday, May 17, 2025

Docker + Node.js + puppeteer: scrap Tugo travel insurance quote

I built a tool to scrap travel insurance quotes from various provider websites. For most of them, the process was straightforward — I could analyze the HTTP traffic to identify the relevant endpoints and use PROC HTTP to send POST requests directly to retrieve quotes.

However, scraping quotes from the TuGo website proved to be much more challenging.
TuGo’s quote system is highly dynamic and heavily reliant on JavaScript. Initially, I considered using Python with Selenium, but it quickly became clear that this approach wasn’t ideal due to performance limitations, maintenance problem and compatibility issues with the site’s rendering logic.

After reviewing several popular tools, I decided to go with Node.js and Puppeteer. This combination proved to be significantly more reliable and better suited for interacting with TuGo’s modern front-end framework.

Here are the basic steps I followed:
$ docker run -it \
  -v "$PWD:/home/pptruser/app" \
  -w /home/pptruser \
  --rm ghcr.io/puppeteer/puppeteer \
  node ./app/tugo_quote.js 06/07/1966 17/05/2025 31/05/2025
    
Sample output:
********  Inputs ********
traveller_dob: 06/07/1966
trip_start_date: 17/05/2025
trip_end_date: 31/05/2025
********  Tugo travel quote START ********
1. Basic information input
1.a Origin
1.b Destination
1.c Trip Start Date
1.d Trip End Date
1.e Trip Arrival Date
1.f Trip Cost
1.g Traveller info
1.h Click Button - Get a Quote
2. Quote results
2.a Close promotion dialog
2.b Fill in Questions if exist
No Questionnaire button found — skipping.
2.c Enable sliders
2.d Quote loop
Quote for Sum Insured $50K Deductiable $0 =  $60.48
Quote for Sum Insured $50K Deductiable $500 =  $54.43
Quote for Sum Insured $50K Deductiable $1000 =  $48.38
Quote for Sum Insured $100K Deductiable $0 =  $85.99
Quote for Sum Insured $100K Deductiable $500 =  $77.39
Quote for Sum Insured $100K Deductiable $1000 =  $68.80
******** Tugo travel quote END ********

Tuesday, April 29, 2025

How to download Google Sheet?

Google Drive is my faverite cloud storage. I save almost all my stuffs here. However, if the file is not public shared, you have to use OAuth 2 to access the file.

Below are the steps:
1. Create new Project in Google Cloud Console and setup OAuth 2.0 Client ID. Here we will set API scope for Google Drive and get client ID, client secret. Note that the information should be secured.

2. Get authorization code manually with URL below. Please open this in browser.
https://accounts.google.com/o/oauth2/auth?scope=https://www.googleapis.com/auth/drive&redirect_uri=urn:ietf:wg:oauth:2.0:oob&response_type=code&client_id=xxxxxxxx.apps.googleusercontent.com

3. Request for access token
%let auth_url=https://accounts.google.com/o/oauth2/auth;
%let redirect_uri=urn:ietf:wg:oauth:2.0:oob;
%let client_id=xxxxxxxxx.apps.googleusercontent.com; /* from step1 */
%let client_secret=xxxxxxxxxxx; /* from step1 */
%let code = xxxxxxxxxxx; /* from step2 */

proc http url="https://oauth2.googleapis.com/token"
    method="POST"
    out=resp
    headerout=hdrs
    ct="application/x-www-form-urlencoded"
    in=form(
            "code"="&code" 
            "client_id"="&client_id" 
            "client_secret"="&client_secret" 
            "redirect_uri"="&redirect_uri" 
            "grant_type"="authorization_code");
run;

%put INFO: &=SYS_PROCHTTP_STATUS_CODE;
%put INFO: &=SYS_PROCHTTP_STATUS_PHRASE;

4. Export the Google sheet as Excel file.
%let access_token=xxxxxxxxxxx; /* from step3 */
filename resp "download_path/result.xlsx";
proc http 
    url="https://docs.google.com/spreadsheets/d/2roVDi0WBqZ5t-gguUJ5eNKWSxJBP4AWiAc3e9sOgdtU/export?format=xlsx" 
    oauth_bear = "&access_token."
    out=resp;
run;

%put INFO: &=SYS_PROCHTTP_STATUS_CODE;
%put INFO: &=SYS_PROCHTTP_STATUS_PHRASE;

Sunday, January 19, 2025

Docker saspy

To keep my work environment clean, I always use docker to containize my faverite tool. I don't worry about the space as my work laptop has no any video. :) In my previous post, I have created script run_saspy.py to run SAS program locally. Below is my docker file to containize the script.
# Dockerfile
# Cmmand: docker build -t saspy .

FROM python

RUN apt update && apt install -y python3-pip && apt install -y default-jre
RUN pip install --upgrade pip
RUN pip install pandas
RUN pip install saspy

COPY sascfg_personal.py /usr/local/lib/python3.13/site-packages/saspy/sascfg_personal.py
COPY authinfo /root/.authinfo
COPY run_saspy.py /usr/local/bin/run_saspy.py

ENTRYPOINT ["run_saspy.py"]
When the docker image is built, I set alias run_saspy and it works perfectly.
$ alias run_saspy='docker run --rm -it -v ${PWD}:${PWD} -w "${PWD}" saspy -i '
$ run_saspy hello.sas
********************************************
Playpen: playpen
Using SAS Config named: oda
SAS Connection established. Subprocess id is 14

NOTE: hello.sas uploaded to ~/playpen/hello.sas successfully!
NOTE: ~/playpen/hello.20250120T003749.log downloaded to hello.20250120T003749.log successfully!
NOTE: Playpen killed successfully!
SAS Connection terminated. Subprocess id was 14
********************************************
********************************************
***        JOB EXECUTION STATUS          ***
********************************************
INFO: Finished successfully!
********************************************

Monday, November 11, 2024

saspy + SAS® OnDemand for Academics = as if SAS is right there beside you

Lately, I’ve had some downtime and I took the opportunity to finish a tool. This tool uploads code and data to SAS® OnDemand for Academics, executes them as if running locally, and then automatically diagnoses certain errors and returns the results once completed.
usage: run_saspy.py [-h] [-i INPUT_PGM] [-o OUTPUT_DIR [OUTPUT_DIR ...]] [-d DATA_UPLOAD [DATA_UPLOAD ...]] [-p PLAYPEN] [-k KILL]
                    [-c CFGNAME]

It executes SAS code using saspy. ONLY ONE LEVEL FOLDER SUPPORTED!

options:
  -h, --help            show this help message and exit
  -i INPUT_PGM, --input_pgm INPUT_PGM
                        (Required) Name of the SAS file to be executed.
  -o OUTPUT_DIR [OUTPUT_DIR ...], --output_dir OUTPUT_DIR [OUTPUT_DIR ...]
                        Remote directories in the current working folder, which will be created in ~// to save outputs.
  -d DATA_UPLOAD [DATA_UPLOAD ...], --data_upload DATA_UPLOAD [DATA_UPLOAD ...]
                        Local files or directories in the current working folder, which will be uploaded to ~//. NO ABSOLUTE
                        PATH ALLOWED!
  -p PLAYPEN, --playpen PLAYPEN
                        Name of playpen. Default is playpen.
  -k KILL, --kill KILL  (Y/N) Kill the playpen after completion.
  -c CFGNAME, --cfgname CFGNAME
                        Name of the Configuration Definition to use for the SASsession. If not specified then just saspy.SASsession()
                        is executed.

In the test below, I uploaded pgm and data, executed them and downloaded them locally. When all tests completed, it will check log and free up the space by cleaning up all playpen data. In this way, I will not worry about the space any more.

Thursday, September 22, 2022

Download SAS work dataset in Viya 4

In Viya 4 SAS Studio, we can not download directly from SAS libraries but export text file (e.g. comma-delimited, tab-delimited). As you can see, we can upload and download files from Contents. Therefore, we can leverage Viya Files service to download SAS datasets. Below is the sample code.
data result;
	set sashelp.class;
run;

filename out filesrvc folderpath='/Users/xxxxx/My Folder' 
	filename='result.sas7bdat' recfm=n lrecl=32767;
filename in "%sysfunc(pathname(work))/a.sas7bdat" recfm=n lrecl=32767;

data _null_;
    rc = fcopy('in', 'out');
    put rc=;
    length msg $1000;
    msg = sysmsg();
    put msg=;
run;

Wednesday, July 13, 2022

Split long string: VARCHAR + data _null_

As we know, the maximum length for character variable is 32767. To handle the string which is longer than 32767, we can split it, store in dataset and recomine it. Below is the sample code:
filename source "file_with_long_string";
filename target "file_with_new_long_string";

data source;
    infile source recfm=f lrecl=32000 pad;
    input text $char32000.;
run;

data target;
    length x $ 32767;
    set source;
    x = prxchange('s/old/new/', -1, text);
run;

data _null_;
    length y varchar(5000000);

    do until(eof);
        set target end=eof;
        y = cats(y, x);
    end;

    file target lrecl = 5000000;
    put y;
run;

Tuesday, June 7, 2022

Hash way to split SAS Table

In traditional SAS way, it will leverage SAS macro to split the SAS dataset. If you are bored with that, you may like the hash way below. Enjoy!
proc sort data=sashelp.class out=class;
    by age;
run;

data _null_ ;
    declare hash h (multidata:"Y", 
    				ordered: "a", 
    				dataset:"class(obs=0)") ;
    h.definekey('name') ;
    h.definedata(all:'Y');
    h.definedone() ;

    do i = 1 by 1 until (last.age);
        set class;
        by age;
        h.add();
    end ;

    h.output(dataset: cats("age_", age));
run;