Skip to content

Detailed analysis and modeling of the Steam game dataset for CIS 5450 Final Project

Notifications You must be signed in to change notification settings

baronhsieh2005/Steam-Dataset-EDA-and-Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Steam-Dataset-EDA-and-Modeling

For our final project, we sought to predict using pre-release metrics the success of games on Steam, the largest digital distribution platform for PC gaming. We chose this topic due to our shared interest in gaming and its potential value for companies and solo developers seeking to create successful games for the Steam platform.

For this project, we decided to use the steam games dataset from Kaggle which contained comprehensive information on every game published on steam. Relevant columns in this dataset included Release Date, Price, Genres, Estimated Owners, Median Playtime Forever, and more. From the information in the dataset, we decided to define success for a game as having a median playtime forever of over one hour. To keep it concise, we chose this metric as it effectively measures user engagement, is resistant to outliers, and was consistently available across all games in our dataset.

Our analysis revealed a significant class imbalance, with approximately 88% of games being unsuccessful by our definition. Throughout this notebook, we detail our approach to addressing this challenge, from data preprocessing to model selection and evaluation. Each section includes clear explanations of our methodology, the reasoning behind our decisions, and the insights gained from our analysis.

About

Detailed analysis and modeling of the Steam game dataset for CIS 5450 Final Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published