Skip to content

Faisaladuko/Data-Engineering

Repository files navigation

End-to-End Data Engineering Project

The world of data engineering is ever-changing, with new tools and technologies emerging regularly. Building an effective analytics platform can be a daunting task, especially if you’re not familiar with all the tools available. How do you turn scattered, complex data into a model that drives insights and decision-making? In this project, I explored best practices such as data modeling, testing, documentation, and version control. I efficiently extract, load, and transform data into a unified, analytics-ready format. This is the construction of a robust data pipeline for a fictional e-commerce company, implementing best practices in data engineering.

Technologies Explored

  1. BigQuery
  2. dbt
  3. Docker
  4. Airbyte
  5. Dagster

Prerequisites

Ensure you have Python 3 installed. If not, you can download and install it from Python's official website.

Installing

  1. Fork the Repository:
    • Click the "Fork" button on the top right corner of this repository.
  2. Clone the repository:
    • git clone https://github.com/YOUR_USERNAME/data-engineering.git
    • Note: Replace YOUR_USERNAME with your GitHub username
  3. Navigate to the directory:
    • cd data engineering
  4. Set Up a Virtual Environment:
    • For Mac:
      • python3 -m venv venv
      • source venv/bin/activate
    • For Windows:
      • python -m venv venv
      • .\venv\Scripts\activate
  5. Install Dependencies:
    • pip install -e ".[dev]"

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages