Source

Architecture

Retrieve historical stock data in the past 1 year through TCBS and SSI public APIs by using vnstock library and store it as year_data.csv: load_year_data.py
Migrate year_data.csv to the vnstock bucket using bash command: migrate_year_data.sh
Daily Airflow data pipeline: daily_pipeline.py
- Retrieve stock data daily and store each day as individual CSV files: load_daily_data.py
- Migrate daily stock data to the vnstock bucket using bash command: migrate_data.sh
- Calculate and select stocks with the most stable growth in the last 3 months and load it to the grown_stock bucket by submitting a job to Dataproc (Spark): load_grown_stock.py
- (Stock with stable growth: Over the past 3 months, the stock price has gone up and the fluctuation range of the MA5 line has not exceeded 5%)
- Run at 4 PM every weekday (Monday to Friday)
- Retry 3 times, each time 5 minutes apart
- Send an alert email when a task failed
Hourly Airflow data pipeline: hourly_pipeline.py
- Choose favorite stocks to subscribe to: SSI, VND, HPG, NKG, VIC, NHA, CEO, LDG, VIX
- Retrieve and publish historical data of subscribe stocks to the Google Pub/Sub vnstock topic hourly: load_subscribe_data.py
- If any subscribed stock drops over 10% compared to the expected price, send a warning message to Telegram via the Telegram bot
- Run hourly from 10 AM to 3 PM every weekday
- Retry 3 times, each time 5 minutes apart
- Send an alert email when a task failed

Load year_data.csv to the Big Query table stock_data: data_sample
Create 3 Cloud Functions
- load_daily_stock: When new daily stock data is being uploaded to vnstock bucket, append it to the stock_data table in Big Query. Data sample: stock_data
- load_subscribe_stock: When new subscribed stock data is being uploaded to vnstock Pub/Sub topic, append it to the subscribed_stock table in Big Query. Data sample: subscribed_stock
- load_grown_stock: When new stock data is being uploaded to grown_stock bucket, write truncate it to grown_stock table in Big Query. Data sample: grown_stock

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
data/processed_data		data/processed_data
images		images
src		src
.gitattributes		.gitattributes
README.md		README.md