Spotify Songs Analysis

This project aims to analyze a dataset containing information about songs streamed on Spotify in 2021. The dataset includes various attributes such as artist, genre, duration, popularity, and more. The analysis involves data cleaning, exploratory data analysis, implementing queries using both SparkSQL and SparkDataframes, and finally, performing classification to predict song genres using SparkML.

Dataset

"C:\Users\Admin\Downloads\spotify_songs.csv"

Queries Implemented

1. Genre Analysis

a) Identify the genre with the highest average popularity. b) Determine the artist who has recorded the most number of songs with a duration of more than 5 minutes. c) Count the number of songs included in each genre. d) Identify the artists who dominated the charts.

2. Song Recommendations

e) Recommend at least 5 fun/not-boring songs that can be played at a party. Features like energy, danceability, etc., will be considered to represent cheerfulness.

Implementation Details

Data Cleaning & Engineering: This phase involves handling missing values, duplicates, and any necessary data transformations to render the data usable for analysis.
Queries Implementation: The queries mentioned above are be implemented twice, once using SparkSQL and then using SparkDataframes. This approach allows for comparing the efficiency and usability of both methods.
Classification: The dataset is split into training and testing sets. Classification tasks are performed using SparkML to predict the genre of each song. Three different classification methods (Logistic Regression, Random Forest, and Decision Tree) are applied, and their accuracies are compared to determine the best classifier.

How to Use

To replicate the analysis or explore the dataset further, follow these steps:

Clone the repository.
Download the dataset and place it in the designated directory.
Run the provided scripts or notebooks for data cleaning, queries implementation, and classification.
Explore the results and analysis provided in the output.

Conclusion

This project offers insights into the streaming patterns of songs on Spotify in 2021. By implementing various queries and classification tasks, we aim to understand the trends in music genres, artist popularity, and provide recommendations for enjoyable party songs. The comparison of SparkSQL and SparkDataframes for query implementation, along with evaluating different classification methods, provides valuable insights for data analysis in Spark environments.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
Spark_SQL_spotify_genre_prediction.ipynb		Spark_SQL_spotify_genre_prediction.ipynb
spotify_genre_prediction.ipynb		spotify_genre_prediction.ipynb
spotify_songs.csv		spotify_songs.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify Songs Analysis

Dataset

Queries Implemented

1. Genre Analysis

2. Song Recommendations

Implementation Details

How to Use

Conclusion

About

Releases

Packages

Languages

mohan010105/spotifygenere

Folders and files

Latest commit

History

Repository files navigation

Spotify Songs Analysis

Dataset

Queries Implemented

1. Genre Analysis

2. Song Recommendations

Implementation Details

How to Use

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages