Skip to content

Chapman University CPSC-392 Data Science Final Project

Notifications You must be signed in to change notification settings

kashishpandey/CPSC392-Spotify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

CPSC392 Data Science Spotify Final Project

Using Spotify dataset to answer these questions:

  1. What clusters can we find between loudness and energy?
  2. Which variables are the least useful in gathering information about the popularity of a song?
  3. Is there an association between the duration and popularity of songs?
  4. Using danceability, energy, and loudness, can we accurately predict whether a song will be popular?
  5. Can we accurately predict the genre of a song based on the set predictors?
  6. Which genres and subgenres show up the most in the data? Which are the most popular?
  7. Can we appropriately cluster songs by genre using album name, track popularity and artist name?
  8. Is there a hierarchical relationship between the variables?
  9. Would PCA help us reduce dimensionality and produce a model that can accurately predict a song's popularity?

Models used: Expectation-Maximization with Gaussian Mixture, Lasso Regression, Linear Regression, K-Means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Decision Tree, and Principal Component Analysis

Tools used for hyperparameter tuning: Elbow Method for choosing epochs, Scree Plot to visualize how many principal components it takes to retain the amount of information we want