Skip to content

Crestward/Data-Analytics-Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 

Repository files navigation

Data Analytics Portfolio

This repository is containing portfolio data analyst projects completed by me for academic, self learning, and hobby purposes. Presented in the form of iPython Notebooks.

Contents:

ETL pipeline

Project Summary

The objective of this project was to extract data from websites and available APIs. The resulting datasets were then transformed by cleaning, joining, and filtering into nine tables. The object-relational database, PostgreSQL, was used to load the datasets into pgAdmin. Thus completing a functional ETL pipeline.

Data sources

The following Data Sources were used:

  • IMDb Website

    • Method: Webscraping extraction
    • Used for: Collecting the Top 250 IMDB rated movie list
  • OMDb API

    • Method: API Extraction
    • Used for: Collecting IMDb id and other movie related details like actor, director, etc.
  • Utelly API

    • Method: API Extraction
    • Used For: Collecting streaming options for Top 250 IMDb movies
  • uNoGS API

    • Method: API Extraction
    • Used For: Collecting movies on Netflix in released in the United States which have an IMDb rating between 7 and 10
  • Google Search Engine

    • Method: Webscraping extraction
    • Used for: Collecting viewing Streaming Service availability and price

Data Cleanup & Analysis

  • Data extracted were formated in CSV and JSON files
  • The following datasets were then transformed by cleaning, joining, and filtering into nine tables
  • The object-relational database, PostgreSQL, was used to load the datasets into pgAdmin.

Project folders:

  • Extract:

    • Google scraping.ipynb:
      • contains IMDB website and Google Search Engine Webscraping
    • netflix_high_imdb_rated(uNoGS api).ipynb:
      • contains IMDB website Webscraping, OMDb API, and uNoGS API extraction
    • streaming_options(utelly api).ipynb:
      • contains Utelly API extraction
  • Transform:

    • Transform.ipynb:
      • contains all datasets that were transformed into nine tables
  • Load:

    • SQL folder:
      • contains ERD and schema
    • SQL_Table folder:
      • contains the creation of and all nine tables created in pgAdmin with PostgreSQL

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published