Skip to content

Latest commit

 

History

History
30 lines (22 loc) · 1.16 KB

README.md

File metadata and controls

30 lines (22 loc) · 1.16 KB

Data warehousing movie analysis

Hi! Welcome to my GitHub repository I created and orchestrated a data pipeline to analyze the IMDB movie data in this project.

The data pipeline was created using the following tools:

Data ingestion: Web scraping from IMDB using Python Data storage: Google BigQuery Data analysis: DBT Data visualization: Power BI

Project Overview

This project involves extracting IMDb movie data, transforming it, and loading it into a data warehouse to perform analytical queries. The primary goal is to gain insights into movie trends, ratings, and other related analytics. Process Diagram

Data Source

The data was sourced from IMDb (Internet Movie Database), which provides comprehensive details about movies, including titles, genres, ratings, and more.

Tools and Technologies

  • Python: For scripting and automation.
  • BigQuery: For database management and querying.
  • Google Cloud: Cloud platform for hosting the data warehouse.

Execution Order

  1. Extract_to_Staging
  2. Data_Warehouse_Schema
  3. ETL_Scripts