- Set up
Airflow
on Google Cloud VM - Create
GCS buckets
:vnstock
,grown_stock
- Set up a Google Cloud connection for
Airflow
- Configure
Airflow SMTP
to send alert emails when a task failed - Create the
vnstock
topic onGoogle Pub/Sub
- Retrieve historical stock data in the past 1 year through
TCBS
andSSI
public APIs by usingvnstock
library and store it asyear_data.csv
: load_year_data.py - Migrate
year_data.csv
to thevnstock
bucket usingbash
command: migrate_year_data.sh - Daily Airflow data pipeline: daily_pipeline.py
- Retrieve stock data daily and store each day as individual
CSV
files: load_daily_data.py - Migrate daily stock data to the
vnstock
bucket usingbash
command: migrate_data.sh - Calculate and select stocks with the most stable growth in the last 3 months and load it to the
grown_stock
bucket by submitting a job toDataproc
(Spark): load_grown_stock.py - (Stock with stable growth: Over the past 3 months, the stock price has gone up and the fluctuation range of the MA5 line has not exceeded 5%)
- Run at 4 PM every weekday (Monday to Friday)
- Retry 3 times, each time 5 minutes apart
- Send an alert email when a task failed
- Retrieve stock data daily and store each day as individual
- Hourly Airflow data pipeline: hourly_pipeline.py
- Choose favorite stocks to subscribe to: SSI, VND, HPG, NKG, VIC, NHA, CEO, LDG, VIX
- Retrieve and publish historical data of subscribe stocks to the
Google Pub/Sub
vnstock
topic hourly: load_subscribe_data.py - If any subscribed stock drops over 10% compared to the expected price, send a warning message to
Telegram
via the Telegram bot - Run hourly from 10 AM to 3 PM every weekday
- Retry 3 times, each time 5 minutes apart
- Send an alert email when a task failed
- Load
year_data.csv
to theBig Query
tablestock_data
: data_sample - Create 3
Cloud Functions
- load_daily_stock: When new daily stock data is being uploaded to
vnstock
bucket, append it to thestock_data
table inBig Query
. Data sample: stock_data - load_subscribe_stock: When new subscribed stock data is being uploaded to
vnstock
Pub/Sub topic, append it to thesubscribed_stock
table inBig Query
. Data sample: subscribed_stock - load_grown_stock: When new stock data is being uploaded to
grown_stock
bucket, write truncate it togrown_stock
table inBig Query
. Data sample: grown_stock
- load_daily_stock: When new daily stock data is being uploaded to