A robust Extract, Load, Transform (ELT) pipeline built with modern data tools including Airbyte, Airflow, and dbt. This project demonstrates a production-ready ELT workflow using containerized services.
This project implements a modern ELT architecture with the following components:
- Airbyte: Handles data extraction and loading (E & L)
- dbt: Manages data transformations (T)
- Airflow: Orchestrates the entire pipeline
- PostgreSQL: Serves as both source and destination databases
- Docker: Containerizes all services for consistent deployment
- Docker and Docker Compose
- Git
- Python 3.8+
-
Clone the repository:
git clone <your-repository-url> cd ELT-project-dbt
-
Set up environment variables:
cp .env.example .env # Edit .env with your configurations
-
Start the services:
./start.sh
-
Stop the services:
./stop.sh
Create or update your dbt profile at ~/.dbt/profiles.yml
:
dbt_project:
outputs:
dev:
dbname: destination_db
host: host.docker.internal
pass: postgres
port: 5434
schema: public
threads: 1
type: postgres
user: postgres
target: dev
ELT-project-dbt/
βββ airbyte/ # Airbyte configuration and connections
βββ airflow/ # Airflow DAGs and configurations
βββ dbt_project/ # DBT transformations and models
βββ elt_script/ # ELT pipeline scripts
βββ source_db_init/ # Initial database setup scripts
βββ docker-compose.yaml
βββ DockerFile
βββ elt.sh # Main ELT execution script
βββ start.sh # Service startup script
βββ stop.sh # Service shutdown script
- Manages data source connections
- Handles data extraction and loading
- Configurable through web UI at
localhost:8000
- Orchestrates the entire ELT pipeline
- Manages task dependencies and scheduling
- Access the UI at
localhost:8080
- Handles data transformations
- Maintains data modeling
- Ensures data quality through tests
- Source data is extracted from PostgreSQL using Airbyte
- Data is loaded into the destination database
- DBT performs transformations on the loaded data
- Airflow orchestrates the entire process
- Configure new source in Airbyte UI
- Update corresponding connection settings
- Modify dbt models as needed
- Add new models in
dbt_project/models/
- Update
schema.yml
with model configurations - Run
dbt run
to test transformations
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Common issues and solutions:
- Connection Issues: Ensure all services are running with
docker ps
- Database Errors: Check PostgreSQL logs in Docker
- Transformation Failures: Verify dbt models and run
dbt debug
- Set up Sakila sample database in source PostgreSQL
- Import base tables and relationships
- Verify data integrity and completeness
- Configure proper indexing for performance
-
Star Schema Implementation
- Design fact tables
- Rental facts (rental_id, customer_id, staff_id, inventory_id, dates, amounts)
- Payment facts (payment_id, customer_id, staff_id, rental_id, amount, date)
- Create dimension tables
- Customer dimension (demographics, addresses)
- Film dimension (categories, ratings, length)
- Store dimension (locations, staff)
- Time dimension (date hierarchies)
- Design fact tables
-
Snowflake Schema Extension
- Normalize dimension tables
- Split address into city, country hierarchies
- Separate film categories and actors
- Create language and rating dimensions
- Normalize dimension tables
- Create base models for initial data load
- Implement staging models
- Customer staging
- Film staging
- Rental staging
- Payment staging
- Develop mart models
- Customer analytics mart
- Film performance mart
- Revenue analysis mart
- Store performance mart
- Add data tests and documentation
- Schema tests
- Data quality tests
- Business logic tests
- Set up source connectors for Sakila database
- Configure destination connectors
- Define replication schedules
- Implement incremental sync strategies
- Create DAG for initial load
- Implement incremental load DAGs
- Set up transformation scheduling
- Add monitoring and alerting
-
Tableau Dashboard Development
-
Revenue Analysis Dashboard
- Daily/Monthly/Yearly revenue trends
- Revenue by store location
- Top performing films
- Customer segments analysis
-
Inventory Performance Dashboard
- Film category performance
- Stock turnover rates
- Rental duration analysis
- Late returns tracking
-
Customer Insights Dashboard
- Customer lifetime value
- Rental frequency patterns
- Geographic distribution
- Payment behavior analysis
-
-
KPI Monitoring
- Set up core business metrics
- Revenue metrics
- Customer metrics
- Inventory metrics
- Create KPI tracking dashboards
- Implement alerts for metric thresholds
- Set up core business metrics
- Document data lineage
- Create data dictionary
- Prepare user guides for dashboards
- Document maintenance procedures
- Performance testing
- Query optimization
- Dashboard response time optimization
- User acceptance testing
- A fully functional data warehouse with both star and snowflake schemas
- Automated ELT pipeline with monitoring
- Interactive Tableau dashboards for business insights
- Comprehensive documentation and maintenance guides