Skip to content

A dataset of Hip Hop samples for Music Information Retrieval research

License

Notifications You must be signed in to change notification settings

jvbalen/sample_100

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sample ID Dataset 2.0

Introduction

A dataset for automatic sample identication. Created in 2011 for the research described in [1] and [2].

This dataset contains 105 sample relations (ids starting with S in samples.csv) between 76 songs that make use of one or more samples, and 68 songs that were sampled (ids starting with T in tracks.csv).

This dataset contains only metadata, with track titles and a few more annotations. Contact me if you would like to use audio or specific features.

Usage

The dataset is intended to be used for evaluation following a standard retrieval paradigm with query and candidate files.

The 76 tracks that contain samples are used as queries. The 68 songs are used as candidates, together with optional 'noise' files.

In [1] and [2], 320 'noise' files similar to the candidates in genre and length were added to challenge the system.

Data

Only samples used in hip hop music were considered. Regarding sample origins, there were no genre restrictions.

For representativeness, the ground truth was chosen to include both short and long samples, tonal and percussive samples, and isolated samples (the only layer in the mix) as well as background samples. So-called ‘interpolations’, i.e. samples that have been re-recorded in the studio, were avoided, as were non-musical samples (e.g. film dialogue).

The dataset was compiled using valuable information from WhoSampled and Hip Hop is Read.

Changes since 2011

Removed entries

  • S102 (T177 sampled by T178)

Fixed WAV files

  • T027.wav
  • T078.wav

References

Please cite one of the following when using this dataset.

[1] Van Balen, J. (2011). Automatic Recognition of Samples in Musical Audio. Master Thesis, Universitat Pompeu Fabra, Barcelona, Spain.

[2] Van Balen, J., Serrà, J., & Haro, M. (2012). Automatic Identification of Samples in Hip Hop Music. In Int. Symp. on Computer Music Modeling and Retrieval (CMMR). London, United Kingdom.

About

A dataset of Hip Hop samples for Music Information Retrieval research

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published