Skip to content

Predicting popularity of movies using the IMDb movies dataset with multiple regression algorithms such as XGBoost, Gradient Boosting, Regularization Regressors, and Stacking Regressor; Performed extensive data cleaning, feature engineering, and used transformation techniques such as winsorization and log-transformation

Notifications You must be signed in to change notification settings

vrittigandhi/data_mining_project_22

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Mining Project 2022

Business objective

Our objective is to identify popular movies to invest in US-produced movies’ copyrights that will likely have a high ROI, as measured by popularity amongst movie-goers.

General approach

In this project, we tested multiple supervised predictive models and dived into a detailed examination of the top three models: XGBRegressor, GradientBoostingRegressor, and RandomForestRegressor. We expect to measure performance using adjusted R2(given the number of features)and RMSE.Based on our analysis, we believe ourXGBoostmodel with the predictors explains 69% of the variation in log-transformed target variable and as measured by adjustedR2.

The data directory has the small datasets used. The ipynb and html versions of the code are in 'notebooks'.

About

Predicting popularity of movies using the IMDb movies dataset with multiple regression algorithms such as XGBoost, Gradient Boosting, Regularization Regressors, and Stacking Regressor; Performed extensive data cleaning, feature engineering, and used transformation techniques such as winsorization and log-transformation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published