Music genres dataset

Dataset

1494 genres
each genre contains 200 songs
for each song, following attributes are provided:
- artist
- song name
- position within the list of 200 songs
- main genre
- sub-genres (with popularity count, which could be interpreted as weight of the sub-genre)
- tags (every label that is not some existing genre, usually emotions, "My top 10 favourite tracs" etc.; also with popularity count)

This dataset is basically list of genres and songs available at EveryNoise extended with data from Spotify and Last.FM.

Scraping scripts

This repository contains scripts to scrape data from internet and then transform it to format that could be easily imported into database.

Scraping data from internet

Install Scrapy: pip install scrapy
Register at http://www.last.fm/api to obtain Last.FM API key, then save it as a file /data/last_fm_api.key

Run scripts in this order:

scrapy runspider genre_sprider.py -o data/genres.jl \
&& scrapy runspider playlist_spider.py -o data/spotify_playlists.jl \
&& scrapy runspider songs_spider.py -o data/songs.jl \
&& scrapy runspider tags_spider.py -o data/tags.jl

When process finishes (it could take several minutes or maybe hours), the following files should be present in /data folder:

genres.jl
songs.jl
spotify_playlists.jl
tags.jl

Data size: ~100 MB Scraping time: ~2,5 hours

Transforming data to CSV format

Run python scripts in folder /csv-scripts in arbitrary order, they should create output files with corresponding names in folder /data/csv.

Importing into database

CSV files generates in previous step should be easily importable into database (tested just on PostgreSQL). After import is finished, run SQL script /sql-scripts/tag.tag_is_genre.sql. to fill up tags.tag_is_genre column.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
csv-scripts		csv-scripts
data		data
sql-scripts		sql-scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data.zip		data.zip
genre_sprider.py		genre_sprider.py
playlist_spider.py		playlist_spider.py
songs_spider.py		songs_spider.py
tags_spider.py		tags_spider.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music genres dataset

Dataset

Scraping scripts

Scraping data from internet

Transforming data to CSV format

Importing into database

About

Releases

Packages

Languages

License

trebi/music-genres-dataset

Folders and files

Latest commit

History

Repository files navigation

Music genres dataset

Dataset

Scraping scripts

Scraping data from internet

Transforming data to CSV format

Importing into database

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages