Skip to content

The project is developed using pyspark on a large spotify dataset found on Kaggle. The analysis applied concern Clustering the musical genres, Regression\Classification on the popularity and a Trend Analysis.

Notifications You must be signed in to change notification settings

carloalbe/Pyspark-Spotify-Analysis

Repository files navigation

Pyspark-Spotify-Analysis

The project is developed on a two "csv" dataset available on the Kaggle platform. The data have been obtained by Spotify. Te main one, "track.csv" the most important and largest, contains music tracks informations from a period of 100 years. The other one instead, "artist.csv",contains a row for each artist. Both the file are comprressed in data.rar. Basing on the suggestion of the dataset's author, we identified three main analysis to apply on the data :

  • Clustering: on the songs, to identify a limited number of genres
  • Classification/Regression: to understand which are the most important features in estimating the popularity of a song
  • Trend Analysis: to see how musical creation changed above the years

About

The project is developed using pyspark on a large spotify dataset found on Kaggle. The analysis applied concern Clustering the musical genres, Regression\Classification on the popularity and a Trend Analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •