Hi! Welcome to my GitHub repository I created and orchestrated a data pipeline to analyze the IMDB movie data in this project.
The data pipeline was created using the following tools:
Data ingestion: Web scraping from IMDB using Python Data storage: Google BigQuery Data analysis: DBT Data visualization: Power BI
This project involves extracting IMDb movie data, transforming it, and loading it into a data warehouse to perform analytical queries. The primary goal is to gain insights into movie trends, ratings, and other related analytics.
The data was sourced from IMDb (Internet Movie Database), which provides comprehensive details about movies, including titles, genres, ratings, and more.
- Python: For scripting and automation.
- BigQuery: For database management and querying.
- Google Cloud: Cloud platform for hosting the data warehouse.
Extract_to_Staging
Data_Warehouse_Schema
ETL_Scripts