Skip to content

This project utilizes a dataset which contains trending YouTube video statistics. For this project various AWS tools were used to store, cleanse, process, catalog, and extract the data for use in a target system for data visualization.

Notifications You must be signed in to change notification settings

claydoers/youtube-analysis-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 

Repository files navigation

Trending Youtube Videos Data Analysis Project

Overview

This goal of this project was to securely manage, streamline, and perform analysis on structured and semi structured data from YouTube based on trending video statistics utilizing the AWS service suite.

Goals

  • Data Ingestion
  • ETL
  • Data Lake
  • Scalability
  • Cloud Processing
  • Data Visualization/Reporting
  • Tools used

  • Amazon S3 - Data lake/Object storage
  • AWS IAM - identity access management for resource access management/security.
  • AWS Glue - ETL/Data integration service.
  • AWS Lambda - Cloud computing service to process our code so we arent processing it locally.
  • AWS Athena - Query service for S3.
  • Python
  • Quicksight - Data visualization & reporting.
  • Architectual Diagram

    image

    Dataset

    This dataset includes several months (and counting) of data on daily trending YouTube videos. Data is included for the US, GB, DE, CA, and FR regions (USA, Great Britain, Germany, Canada, and France, respectively), with up to 200 listed trending videos per day. Each region’s data is in a separate file. Data includes the video title, channel title, publish time, tags, views, likes and dislikes, description, and comment count.

    The data also includes a category_id field, which varies between regions. To retrieve the categories for a specific video, find it in the associated JSON. One such file is included for each of the five regions in the dataset.

    Dataset on Kaggle

    Dashboard Examples

    1. Dyanamic dashboard that allows user to filter by video category and display results (music category used for the example below).

    image

    About

    This project utilizes a dataset which contains trending YouTube video statistics. For this project various AWS tools were used to store, cleanse, process, catalog, and extract the data for use in a target system for data visualization.

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published