Skip to content

Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard. The dashboard is then used to support a purchasing decision of which Headphone / IEM to get.

Notifications You must be signed in to change notification settings

ris-tlp/audiophile-e2e-pipeline

Repository files navigation

Audiophile End-To-End ELT Pipeline

Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard.

Architecture

Architecture

Infrastructure provisioning through Terraform, containerized through Docker and orchestrated through Airflow. Created dashboard through Metabase.

DAG Tasks:

  1. Scrape data from Crinacle's website to generate bronze data.
  2. Load bronze data to AWS S3.
  3. Initial data parsing and validation through Pydantic to generate silver data.
  4. Load silver data to AWS S3.
  5. Load silver data to AWS Redshift.
  6. Load silver data to AWS RDS for future projects.
  7. and 8. Transform and test data through dbt in the warehouse.

Dashboard

Dashboard

Requirements

  1. Configure AWS account through AWS CLI. [Reqruired for Terraform]
  2. Terraform. [Required to provision AWS services]
  3. Docker / Docker-Compose. [Required to run Airflow DAG / pipeline]

Run Pipeline

  1. make infra: create AWS services. You will be asked to enter a password for your Redshift and RDS clusters.
  2. make config: generate configuration with Terraform outputs and AWS credentials.
  3. make base-build: build base airflow image with project requirements.
  4. make build: build docker images for airflow.
  5. make up: run the pipeline.

About

Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard. The dashboard is then used to support a purchasing decision of which Headphone / IEM to get.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published