Using Spotify dataset to answer these questions:
- What clusters can we find between loudness and energy?
- Which variables are the least useful in gathering information about the popularity of a song?
- Is there an association between the duration and popularity of songs?
- Using danceability, energy, and loudness, can we accurately predict whether a song will be popular?
- Can we accurately predict the genre of a song based on the set predictors?
- Which genres and subgenres show up the most in the data? Which are the most popular?
- Can we appropriately cluster songs by genre using album name, track popularity and artist name?
- Is there a hierarchical relationship between the variables?
- Would PCA help us reduce dimensionality and produce a model that can accurately predict a song's popularity?
Models used: Expectation-Maximization with Gaussian Mixture, Lasso Regression, Linear Regression, K-Means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Decision Tree, and Principal Component Analysis
Tools used for hyperparameter tuning: Elbow Method for choosing epochs, Scree Plot to visualize how many principal components it takes to retain the amount of information we want