- I wanted to learn the concepts of a data pipeline. I also chose to use trustworthy data by creating a process to keep track of my Spotify listening history. Thus, I chose to make a pipeline containing my streaming history using the Spotify API. Overall, it authenticates using a token, requests my most recent 50 songs played and inserts them into a Postgresql database.
- Authenticating through the Spotify proved to be challenging without using the third party framework Spotipy. I could not figure out how to authenticate without using the framework.
- I wanted to gather all songs streamed from the beginning of the year
but I ran into issues using the user-read-recently-played scope
endpoint within the Spotify API. Using both before and after
parameters keep returning null results. Hopefully this issue is
addressed in the near future, so I can go back and gather a full year
of listening history.
- I was able to download a year worth of streaming history by requesting personal data through Spotify. This returned four JSON files of music played within a calendar year.
- Cron
- Dotenv
- Matplotlib
- Numpy
- PostgreSQL
- Python
- Requests
- Spotify API
- Spotipy
- SQLAlchemy
- WordCloud
- Fundamentals of data extraction, transformation and loading (ETL)
- How to work with the Spotify API
- More experience with PostgreSQL
- SQL triggers, aggregate functions, and, views.
- Making word clouds with mask images
flowchart TD;
id1[BEGIN] -- 1. Request Peronsal Data -->
id2(((Spotify API)))
id2 --2. Parse Personal Data --> id3[JSON File] -- 3. Load Streams Table -->
id4[(Database)] --4. Query Unique Song IDs --> id5[Unique Song IDs]
-- 5. Request Song Data ----> id2
id2 -- 6. Load Artist, Album, Song Tables --> id4
flowchart TD;
id1[etl.py] -- 1. Request Recently Played Songs -->
id2(((Spotify API)))
id2 --2. Parse Recent Songs Data --> id3[JSON] -- "3. Load Streams [Artists|Albums|Songs] Table" -->
id4[(Database)]
classDiagram
class Album
Album: +String id
Album: +String name
Album: +String release_year
Album: +String artist_id
class Artist
Artist: +String id
Artist: +String name
class Song
Song: +String id
Song: +String name
Song: +String artist_id
Song: +String album_id
class SongStreamed
SongStreamed: +String song_id
SongStreamed: +Datetime played_at
Artist "1" --> "*" Album
Artist "1" --> "*" Song
Album "1" --> "*" Song
SongStreamed --> Song