Pyspark-Spotify-Analysis

The project is developed on a two "csv" dataset available on the Kaggle platform. The data have been obtained by Spotify. Te main one, "track.csv" the most important and largest, contains music tracks informations from a period of 100 years. The other one instead, "artist.csv",contains a row for each artist. Both the file are comprressed in data.rar. Basing on the suggestion of the dataset's author, we identified three main analysis to apply on the data :

Clustering: on the songs, to identify a limited number of genres
Classification/Regression: to understand which are the most important features in estimating the popularity of a song
Trend Analysis: to see how musical creation changed above the years

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Pyspark-Spotify-Analysis

Files

README.md

Latest commit

History

README.md

File metadata and controls

Pyspark-Spotify-Analysis