These scripts take download files from music services like Amazon Music, Tidal, and SoundCloud, find missing data by scraping webpages, cache it in a database, and produce a file with streams that you can import into Last.fm.
- Node/NPM
- Docker
Note: this software will scrape data from music services. This probably violates ToS. There are extreme rate limits built into this software (one request every 5-20 seconds) to avoid it looking like data is being scraped. However I assume zero liability for any repercussions or legal issues that result from you using this software.
docker compose up -D
npm i
- Download and rename the file corresponding to your music service (note: you may want to use a text editor to remove some of the streams from your download if you have been tracking streams from that service with Last.fm before the download was requested; otherwise you will have duplicate scrobbles in Last.fm):
- amazon-streams.csv
- soundcloud-streams.csv
- tidal-streams.json
- Run the script corresponding to your music service:
- node generate-amazon-scrobbles.mjs
- node generate-soundcloud-scrobbles.mjs
- node generate-tidal-scrobbles.mjs
- Wait for the script to complete (may take several minutes or even a couple hours because of rate limiting)
- Look for the resulting file that is saved:
- amazon-scrobbles.json
- soundcloud-scrobbles.json
- tidal-scrobbles.json
These aren't steps to follow, this is just what is going on behind the scenes:
- The Docker command uses the
docker-compose.yml
to host a Mongo database and a Redis database in Docker.- The Mongo database is for storing the cache of artist and album data we already found when scraping so we don't have to scrape the same page multiple times
- The Redis database is for hosting the a queue. Each stream from the download file is parsed and a job is created to either fetch the data from the Mongo cache or scrape the music service page if there is a cache miss.
- The JS scripts will each do the following:
- Parse the input file to an in-memory JS object
- Create a BullMQ job for each stream in the object
- Add the job to the queue
- Run the queue
- Save the results in a file
- Each job in the queue will do the following:
- Make a request to the Mongo database for the data
- If the data is found, add an entry to a final data in-memory object
- If the data is not found, use Selenium and the Chrome webdriver to load a relevant webpage from the music service (either the song page using an ID or a search results page using the song name and artist name) and target the missing data which it will add to the final data object and save in the database for the future
There are some old ideas in the old
folder. Don't worry about them unless you want to try to make this work in a browser or something. Be warned they are very unsupported.