Saturday, May 17, 2025

Docker + Node.js + puppeteer: scrap Tugo travel insurance quote

I built a tool to scrap travel insurance quotes from various provider websites. For most of them, the process was straightforward — I could analyze the HTTP traffic to identify the relevant endpoints and use PROC HTTP to send POST requests directly to retrieve quotes.

However, scraping quotes from the TuGo website proved to be much more challenging.
TuGo’s quote system is highly dynamic and heavily reliant on JavaScript. Initially, I considered using Python with Selenium, but it quickly became clear that this approach wasn’t ideal due to performance limitations, maintenance problem and compatibility issues with the site’s rendering logic.

After reviewing several popular tools, I decided to go with Node.js and Puppeteer. This combination proved to be significantly more reliable and better suited for interacting with TuGo’s modern front-end framework.

Here are the basic steps I followed:
$ docker run -it \
  -v "$PWD:/home/pptruser/app" \
  -w /home/pptruser \
  --rm ghcr.io/puppeteer/puppeteer \
  node ./app/tugo_quote.js 06/07/1966 17/05/2025 31/05/2025
    
Sample output:
********  Inputs ********
traveller_dob: 06/07/1966
trip_start_date: 17/05/2025
trip_end_date: 31/05/2025
********  Tugo travel quote START ********
1. Basic information input
1.a Origin
1.b Destination
1.c Trip Start Date
1.d Trip End Date
1.e Trip Arrival Date
1.f Trip Cost
1.g Traveller info
1.h Click Button - Get a Quote
2. Quote results
2.a Close promotion dialog
2.b Fill in Questions if exist
No Questionnaire button found — skipping.
2.c Enable sliders
2.d Quote loop
Quote for Sum Insured $50K Deductiable $0 =  $60.48
Quote for Sum Insured $50K Deductiable $500 =  $54.43
Quote for Sum Insured $50K Deductiable $1000 =  $48.38
Quote for Sum Insured $100K Deductiable $0 =  $85.99
Quote for Sum Insured $100K Deductiable $500 =  $77.39
Quote for Sum Insured $100K Deductiable $1000 =  $68.80
******** Tugo travel quote END ********

Tuesday, April 29, 2025

How to download Google Sheet?

Google Drive is my faverite cloud storage. I save almost all my stuffs here. However, if the file is not public shared, you have to use OAuth 2 to access the file.

Below are the steps:
1. Create new Project in Google Cloud Console and setup OAuth 2.0 Client ID. Here we will set API scope for Google Drive and get client ID, client secret. Note that the information should be secured.

2. Get authorization code manually with URL below. Please open this in browser.
https://accounts.google.com/o/oauth2/auth?scope=https://www.googleapis.com/auth/drive&redirect_uri=urn:ietf:wg:oauth:2.0:oob&response_type=code&client_id=xxxxxxxx.apps.googleusercontent.com

3. Request for access token
%let auth_url=https://accounts.google.com/o/oauth2/auth;
%let redirect_uri=urn:ietf:wg:oauth:2.0:oob;
%let client_id=xxxxxxxxx.apps.googleusercontent.com; /* from step1 */
%let client_secret=xxxxxxxxxxx; /* from step1 */
%let code = xxxxxxxxxxx; /* from step2 */

proc http url="https://oauth2.googleapis.com/token"
    method="POST"
    out=resp
    headerout=hdrs
    ct="application/x-www-form-urlencoded"
    in=form(
            "code"="&code" 
            "client_id"="&client_id" 
            "client_secret"="&client_secret" 
            "redirect_uri"="&redirect_uri" 
            "grant_type"="authorization_code");
run;

%put INFO: &=SYS_PROCHTTP_STATUS_CODE;
%put INFO: &=SYS_PROCHTTP_STATUS_PHRASE;

4. Export the Google sheet as Excel file.
%let access_token=xxxxxxxxxxx; /* from step3 */
filename resp "download_path/result.xlsx";
proc http 
    url="https://docs.google.com/spreadsheets/d/2roVDi0WBqZ5t-gguUJ5eNKWSxJBP4AWiAc3e9sOgdtU/export?format=xlsx" 
    oauth_bear = "&access_token."
    out=resp;
run;

%put INFO: &=SYS_PROCHTTP_STATUS_CODE;
%put INFO: &=SYS_PROCHTTP_STATUS_PHRASE;

Sunday, January 19, 2025

Docker saspy

To keep my work environment clean, I always use docker to containize my faverite tool. I don't worry about the space as my work laptop has no any video. :) In my previous post, I have created script run_saspy.py to run SAS program locally. Below is my docker file to containize the script.
# Dockerfile
# Cmmand: docker build -t saspy .

FROM python

RUN apt update && apt install -y python3-pip && apt install -y default-jre
RUN pip install --upgrade pip
RUN pip install pandas
RUN pip install saspy

COPY sascfg_personal.py /usr/local/lib/python3.13/site-packages/saspy/sascfg_personal.py
COPY authinfo /root/.authinfo
COPY run_saspy.py /usr/local/bin/run_saspy.py

ENTRYPOINT ["run_saspy.py"]
When the docker image is built, I set alias run_saspy and it works perfectly.
$ alias run_saspy='docker run --rm -it -v ${PWD}:${PWD} -w "${PWD}" saspy -i '
$ run_saspy hello.sas
********************************************
Playpen: playpen
Using SAS Config named: oda
SAS Connection established. Subprocess id is 14

NOTE: hello.sas uploaded to ~/playpen/hello.sas successfully!
NOTE: ~/playpen/hello.20250120T003749.log downloaded to hello.20250120T003749.log successfully!
NOTE: Playpen killed successfully!
SAS Connection terminated. Subprocess id was 14
********************************************
********************************************
***        JOB EXECUTION STATUS          ***
********************************************
INFO: Finished successfully!
********************************************