Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ta10 RRKing #257

Open
wants to merge 6 commits into
base: etl-ta8
Choose a base branch
from
Open

Ta10 RRKing #257

wants to merge 6 commits into from

Conversation

SSSSShi
Copy link
Collaborator

@SSSSShi SSSSShi commented Nov 19, 2024

TA10 - Data Engineering (ETL)
Team RRKing
Reina Shi, Roxanne Wang

Video demo: https://youtu.be/F5xhCUTp6sQ
Dataset: https://huggingface.co/datasets/neuralwork/arxiver

Project Architecture:
Function - includes four AWS lambda functions.

  • etl-extract.py: Fetches data from Hugging Face API
  • etl.transform.py: Cleans and standardizes the data
  • etl-load.py: Saves processed data to S3
  • etl-error-handler.py: Manages errors across all stages

step-functions.json: Orchestrates the workflow of all lambda functions.

@mhy-666
Copy link
Collaborator

mhy-666 commented Dec 6, 2024

Code looks good and clear presentation. Nice work! 45/45.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants