Skip to content

Latest commit

 

History

History
6 lines (5 loc) · 769 Bytes

README.md

File metadata and controls

6 lines (5 loc) · 769 Bytes

Pyspark-Spotify-Analysis

The project is developed on a two "csv" dataset available on the Kaggle platform. The data have been obtained by Spotify. Te main one, "track.csv" the most important and largest, contains music tracks informations from a period of 100 years. The other one instead, "artist.csv",contains a row for each artist. Both the file are comprressed in data.rar. Basing on the suggestion of the dataset's author, we identified three main analysis to apply on the data :

  • Clustering: on the songs, to identify a limited number of genres
  • Classification/Regression: to understand which are the most important features in estimating the popularity of a song
  • Trend Analysis: to see how musical creation changed above the years