Last.fm Music Groomers

These scripts take download files from music services like Amazon Music, Tidal, and SoundCloud, find missing data by scraping webpages, cache it in a database, and produce a file with streams that you can import into Last.fm.

Requirements

Node/NPM
Docker

Note: this software will scrape data from music services. This probably violates ToS. There are extreme rate limits built into this software (one request every 5-20 seconds) to avoid it looking like data is being scraped. However I assume zero liability for any repercussions or legal issues that result from you using this software.

Setup

docker compose up -D
npm i
Download and rename the file corresponding to your music service (note: you may want to use a text editor to remove some of the streams from your download if you have been tracking streams from that service with Last.fm before the download was requested; otherwise you will have duplicate scrobbles in Last.fm):
- amazon-streams.csv
- soundcloud-streams.csv
- tidal-streams.json
Run the script corresponding to your music service:
- node generate-amazon-scrobbles.mjs
- node generate-soundcloud-scrobbles.mjs
- node generate-tidal-scrobbles.mjs
Wait for the script to complete (may take several minutes or even a couple hours because of rate limiting)
Look for the resulting file that is saved:
- amazon-scrobbles.json
- soundcloud-scrobbles.json
- tidal-scrobbles.json

How it works

These aren't steps to follow, this is just what is going on behind the scenes:

The Docker command uses the docker-compose.yml to host a Mongo database and a Redis database in Docker.
- The Mongo database is for storing the cache of artist and album data we already found when scraping so we don't have to scrape the same page multiple times
- The Redis database is for hosting the a queue. Each stream from the download file is parsed and a job is created to either fetch the data from the Mongo cache or scrape the music service page if there is a cache miss.
The JS scripts will each do the following:
1. Parse the input file to an in-memory JS object
2. Create a BullMQ job for each stream in the object
3. Add the job to the queue
4. Run the queue
5. Save the results in a file
Each job in the queue will do the following:
1. Make a request to the Mongo database for the data
2. If the data is found, add an entry to a final data in-memory object
3. If the data is not found, use Selenium and the Chrome webdriver to load a relevant webpage from the music service (either the song page using an ID or a search results page using the song name and artist name) and target the missing data which it will add to the final data object and save in the database for the future

Other

There are some old ideas in the old folder. Don't worry about them unless you want to try to make this work in a browser or something. Be warned they are very unsupported.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
old		old
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
generate-amazon-scrobbles.mjs		generate-amazon-scrobbles.mjs
generate-soundcloud-scrobbles.mjs		generate-soundcloud-scrobbles.mjs
generate-tidal-scrobbles.mjs		generate-tidal-scrobbles.mjs
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Last.fm Music Groomers

Requirements

Setup

How it works

Other

About

Releases

Packages

Languages

interlucid/music-groomers

Folders and files

Latest commit

History

Repository files navigation

Last.fm Music Groomers

Requirements

Setup

How it works

Other

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages