A Python-based project for collecting and analyzing YouTube data using the YouTube Data API. This project demonstrates how to fetch YouTube video metadata, preprocess the data, and perform exploratory data analysis (EDA) to gain insights into video trends, performance, and content engagement.
- Data Collection: Fetch video data such as titles, descriptions, view counts, likes, dislikes, and more using the YouTube Data API.
- Data Preprocessing: Handle missing values, clean data, and transform it for analysis.
- Exploratory Data Analysis (EDA):
- Analyze video performance metrics.
- Visualize trends such as popular categories, top-performing videos, and engagement rates.
- Detect correlations between features.
- Flexible Framework: Modular code structure for easy extension and maintenance.
- Python: Core programming language.
- Pandas: For data manipulation and analysis.
- Matplotlib & Seaborn: For data visualization.
- YouTube Data API: For accessing YouTube video metadata.
-
Clone the repository:
git clone https://github.com/sentryxgith/YouTube-Data-Collection-and-Analysis-with-Python.git cd YouTube-Data-Collection-and-Analysis-with-Python
-
Obtain your YouTube Data API key:
- Visit Google Cloud Console.
- Create a new project and enable the YouTube Data API v3.
- Generate an API key and replace the placeholder in the script.
-
Set up the API key: Open the script where the API key is required, and replace:
API_KEY = "YOUR_API_KEY_HERE"
-
Run the script:
- Use
main.py
to fetch YouTube data. - Perform analysis using the Jupyter Notebooks provided.
- Use
-
Explore the results:
- View processed data in CSV format.
- Open the EDA notebook to visualize insights.
YouTube-Data-Collection-and-Analysis-with-Python/
├── main.py # Script for data fetching
├── data.py # Script for data preprocessing
├── distribution.py # Script for analyzing data by distribution
├── category.py # Script for analyzing data by categories
├── duration.py # Script for analyzing data by duration
├── tags.py # Script for analyzing data by tags
├── publish hour.py # Script for analyzing data by publish hour
├── README.md # Project documentation
- Category Popularity: Identify which categories have the most videos trending.
- Engagement Metrics: Compare likes, dislikes, and comment counts for videos.
- Time Trends: Understand how upload times affect video popularity.
Contributions are welcome! Please fork the repository, create a branch, and submit a pull request.
This project is licensed under the MIT License.
- The YouTube Data API team for providing an excellent API.
- The open-source Python community for their amazing libraries.
Happy analyzing! 🎉
Feel free to adjust this based on additional project-specific details!