Modern ELT Pipeline Project

A robust Extract, Load, Transform (ELT) pipeline built with modern data tools including Airbyte, Airflow, and dbt. This project demonstrates a production-ready ELT workflow using containerized services.

🏗️ Architecture

This project implements a modern ELT architecture with the following components:

Airbyte: Handles data extraction and loading (E & L)
dbt: Manages data transformations (T)
Airflow: Orchestrates the entire pipeline
PostgreSQL: Serves as both source and destination databases
Docker: Containerizes all services for consistent deployment

🚀 Quick Start

Prerequisites

Docker and Docker Compose
Git
Python 3.8+

Setup Instructions

Clone the repository:

git clone <your-repository-url>
cd ELT-project-dbt

Set up environment variables:

cp .env.example .env
# Edit .env with your configurations

Start the services:
```
./start.sh
```
Stop the services:
```
./stop.sh
```

DBT Configuration

Create or update your dbt profile at ~/.dbt/profiles.yml:

dbt_project:
  outputs:
    dev:
      dbname: destination_db
      host: host.docker.internal
      pass: postgres
      port: 5434
      schema: public
      threads: 1
      type: postgres
      user: postgres
  target: dev

📁 Project Structure

ELT-project-dbt/
├── airbyte/          # Airbyte configuration and connections
├── airflow/          # Airflow DAGs and configurations
├── dbt_project/      # DBT transformations and models
├── elt_script/       # ELT pipeline scripts
├── source_db_init/   # Initial database setup scripts
├── docker-compose.yaml
├── DockerFile
├── elt.sh           # Main ELT execution script
├── start.sh         # Service startup script
└── stop.sh          # Service shutdown script

🔧 Components

Airbyte

Manages data source connections
Handles data extraction and loading
Configurable through web UI at localhost:8000

Airflow

Orchestrates the entire ELT pipeline
Manages task dependencies and scheduling
Access the UI at localhost:8080

DBT

Handles data transformations
Maintains data modeling
Ensures data quality through tests

📊 Data Flow

Source data is extracted from PostgreSQL using Airbyte
Data is loaded into the destination database
DBT performs transformations on the loaded data
Airflow orchestrates the entire process

🛠️ Development

Adding New Data Sources

Configure new source in Airbyte UI
Update corresponding connection settings
Modify dbt models as needed

Creating New Transformations

Add new models in dbt_project/models/
Update schema.yml with model configurations
Run dbt run to test transformations

🤝 Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Troubleshooting

Common issues and solutions:

Connection Issues: Ensure all services are running with docker ps
Database Errors: Check PostgreSQL logs in Docker
Transformation Failures: Verify dbt models and run dbt debug

📚 Additional Resources

📋 TODO: Sakila Database Implementation

Phase 1: Source Data Setup

Set up Sakila sample database in source PostgreSQL
- Import base tables and relationships
- Verify data integrity and completeness
- Configure proper indexing for performance

Phase 2: Data Warehouse Design

Star Schema Implementation
- Design fact tables
  - Rental facts (rental_id, customer_id, staff_id, inventory_id, dates, amounts)
  - Payment facts (payment_id, customer_id, staff_id, rental_id, amount, date)
- Create dimension tables
  - Customer dimension (demographics, addresses)
  - Film dimension (categories, ratings, length)
  - Store dimension (locations, staff)
  - Time dimension (date hierarchies)
Snowflake Schema Extension
- Normalize dimension tables
  - Split address into city, country hierarchies
  - Separate film categories and actors
  - Create language and rating dimensions

Phase 3: DBT Transformations

Phase 4: Airbyte Configuration

Set up source connectors for Sakila database
Configure destination connectors
Define replication schedules
Implement incremental sync strategies

Phase 5: Airflow Orchestration

Create DAG for initial load
Implement incremental load DAGs
Set up transformation scheduling
Add monitoring and alerting

Phase 6: Analytics & Dashboards

Tableau Dashboard Development
- Revenue Analysis Dashboard
  - Daily/Monthly/Yearly revenue trends
  - Revenue by store location
  - Top performing films
  - Customer segments analysis
- Inventory Performance Dashboard
  - Film category performance
  - Stock turnover rates
  - Rental duration analysis
  - Late returns tracking
- Customer Insights Dashboard
  - Customer lifetime value
  - Rental frequency patterns
  - Geographic distribution
  - Payment behavior analysis
KPI Monitoring
- Set up core business metrics
  - Revenue metrics
  - Customer metrics
  - Inventory metrics
- Create KPI tracking dashboards
- Implement alerts for metric thresholds

Phase 7: Documentation & Handover

Document data lineage
Create data dictionary
Prepare user guides for dashboards
Document maintenance procedures

Phase 8: Testing & Optimization

Performance testing
Query optimization
Dashboard response time optimization
User acceptance testing

Expected Outcomes

A fully functional data warehouse with both star and snowflake schemas
Automated ELT pipeline with monitoring
Interactive Tableau dashboards for business insights
Comprehensive documentation and maintenance guides

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modern ELT Pipeline Project

🏗️ Architecture

🚀 Quick Start

Prerequisites

Setup Instructions

DBT Configuration

📁 Project Structure

🔧 Components

Airbyte

Airflow

DBT

📊 Data Flow

🛠️ Development

Adding New Data Sources

Creating New Transformations

🤝 Contributing

📝 License

🆘 Troubleshooting

📚 Additional Resources

📋 TODO: Sakila Database Implementation

Phase 1: Source Data Setup

Phase 2: Data Warehouse Design

Phase 3: DBT Transformations

Phase 4: Airbyte Configuration

Phase 5: Airflow Orchestration

Phase 6: Analytics & Dashboards

Phase 7: Documentation & Handover

Phase 8: Testing & Optimization

Expected Outcomes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.vscode		.vscode
airbyte		airbyte
airflow		airflow
dbt_project		dbt_project
elt_script		elt_script
source_db_init		source_db_init
.gitignore		.gitignore
DockerFile		DockerFile
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
elt.sh		elt.sh
start.sh		start.sh
stop.sh		stop.sh

License

AbdullahAbaza/ELT-project-dbt

Folders and files

Latest commit

History

Repository files navigation

Modern ELT Pipeline Project

🏗️ Architecture

🚀 Quick Start

Prerequisites

Setup Instructions

DBT Configuration

📁 Project Structure

🔧 Components

Airbyte

Airflow

DBT

📊 Data Flow

🛠️ Development

Adding New Data Sources

Creating New Transformations

🤝 Contributing

📝 License

🆘 Troubleshooting

📚 Additional Resources

📋 TODO: Sakila Database Implementation

Phase 1: Source Data Setup

Phase 2: Data Warehouse Design

Phase 3: DBT Transformations

Phase 4: Airbyte Configuration

Phase 5: Airflow Orchestration

Phase 6: Analytics & Dashboards

Phase 7: Documentation & Handover

Phase 8: Testing & Optimization

Expected Outcomes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages