Spotify trends & COVID-19: Correlations between music listening trends and hardship levels from COVID
EXECUTIVE SUMMARY
Moderate and strong correlations in some countries between Covid infection rates and music valence ("happiness") preferences indicate that Spotify could adapt it's song recommendation algorithms to take into account how different countries respond to crisis. Some countries saw average valence preferences increase (users prefer happier music) and other countries decrease. Spotify can use this data to improve the user experience, opimize exclusive contracts with certain artists that match the ‘mood’ of each country, and tehreby increase premium subscription revenue.
REPOSITORY NAVIGATION
- starter-codes contains the initial base codes for our entire project, which tested Spotify's API and the plotting techniques.
- data-exploration contains the individual data handling, CSV files, inputs and outputs. Each country or region explored has it's own subfolder.
- main-data-analysis contains the integrated final codes that created the analysis plots, which pulled data from the exploration data on each country and region.
- documents contains the presentation and written report.
OBJECTIVE
Explore how COVID-19 affected music streaming user preferences to answer:
- Are users listening to happy music to cheer up?
- Are users listening to sad music to cope?
- Do results vary by country?
BUSINESS IMPACT
Spotify can use this data to improve their algorithms and offer more personalized suggestions to users in different regions, that could potentially provide a better experience to the user, thereby retaining more premium users and increasing revenue.
DATA
- Spotifycharts.com for historical data on top 200 songs by country and region.
- Spotify API for metrics on individual tracks.
- Our World data for Covid data.
DEPENDENCIES
- import matplotlib.pyplot as plt
- import pandas as pd
- import numpy as np
- import requests
- import json
- import re, glob
- import os, sys
- from scipy import stats
- import seaborn as sn
- import spotipy
DATA EXPLORATION
-
Convert input CSVs into DataFrame COVID data in csv files from https://ourworldindata.org [5] and Spotify Top Charts at spotifycharts.com. They were read and transformed into DataFrames using Pandas.
-
Obtain track features from API Using the Spotify API and the track names from the Spotify DataFrame, we got track information regarding Valence. This is a metric used by Spotify which characterize whether a song is relatively sad or happy.
-
Delete faulty/repeated data Some tracks were not found, and some information was not available in certain countries. These values in the DataFrame were dropped.
-
Plot key indexes over time Using the dates in which the data was obtained, we merged the DataFrames and plotted valence.
-
Obtain correlations between variables Correlations between Covid severity and listerner music preferences as reflected in the Spotify Top 200 Charts were obtained by using scatter plots and linear regression
analysis.
DATA TRANSFORMATION
From Spotify we obtained CSV files for the weekly top 200 charts for each country and the global one. From Our World in Data COVID statistics divided by date and country were used. Also, the Spotify API was used to obtain track information and audio features for each track.
From this data, key metric to study was Valence, defined from sounded studies as a measure which indicates whether people associate happy or sad sounds. This is important because it is how Spotify describes the musical positiveness conveyed by a track. The number of COVID cases and deaths were also analysed in order to compare how valence behaved over time.
Import CSV files obtained from the data sources, and turned them into Pandas DataFrames. From the DataFrame with the Spotify data the Spotify API was used to obtain the track ID associated with each song, and then the audio features of the song, which included song valence.
CLEAN-UP DATA
Dropped songs that are not in certain countries, songs that are no longer in Spotify and songs with missing values.
PANDAS ANALYSIS
Relevant mean values and weighted averages to simplify the data to be grouped by month and year were retrieved. Matplotlib helped to plot valence through time, comparing the monthly change in 2019 and 2020 also scatter plots and linear regression analysis to obtain the correlation between variables was used.
RESULTS
Valence preference varies between countries. Hispanic and Latin countries have higher mean valence. Mean valence between 2019 and 2020 was significantly different in some countries. México decreased in 2020. Spain increased in 2020. Globally, there was no significant change in mean valence.
Valence vs. Cases per Million Global tendency shows a strong positive correlation. Germany and the US match the trend. México, Spain and India show from a moderate to a strong negative correlation.
Valence vs. Deaths per Million Global tendency shows a strong positive correlation. Germany and the US match the trend. México, Brazil, Italy and India show from a moderate to a strong negative correlation.
CONCLUSIONS
The majority of the countries analyzed presented a correlation between Covid severity and changes in listening preferencies.
Therefore, Spotify can use the data obtained to offer more personalized suggestions to users depending on the country, and optimize exclusive contracts with certain artists that match the ‘mood’ of each country.
=======