This repository contains the code for my midterm project in CP102 - Computer Programming 2 at Manuel S. Enverga University Foundation. The goal of this project is to create a Python program that performs data wrangling, exploratory data analysis (EDA), and data visualization on a dataset. For this project, I chose to analyze the 25k IMDb Movies Dataset, which is an open-source dataset available on Kaggle.
The data used for the analysis and visualization was sourced from Kaggle and is named 25k IMDb Movies.csv
in the repository. It includes columns such as Movie Title, Run Time, Rating, User Rating, Genres, Overview Plot Keyword, Director, Top 5 Casts, Writer, Year, and Path, each containing different data types including strings, integers, floats, and lists. However, the raw dataset contained some impurities, such as typographical errors and mixed values in columns, which required data wrangling to address.
The code for this project is contained in the Echevaria_Movies_Analysis.ipynb
Jupyter Notebook file. This notebook contains the code for data wrangling, exploratory data analysis, and data visualization. The notebook is well-documented with markdown cells explaining the purpose of each code block.
The results of this analysis are presented in the Echevaria_Movies_Analysis.ipynb
notebook. The analysis includes a summary of the dataset, exploratory data analysis, and visualizations of various aspects of the data. The insights gained from this analysis are discussed in the Midterm Portfolio.