Skip to content

DOREMUS-ANR/music-embeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOREMUS Music Embeddings

Vectors of music entities.

This folder is intended to hosts graph embedding for different concepts belonging to music knowledge, from instruments to works to playlist.

The embeddings have been computed on top of the DOREMUS knowledge base, which contains data about over 148.000 works, 26.000 artists, 5.000 concerts, apart of a interesting set of controlled vocabularies.

Format

For each different concept, we provide 3 different files:

  • *.emb.u URI file, that contains the URIs of involved resources;
  • *.emb.l Label file, with the label in English (if present) or in any available language;
  • *.emb Vector file, which contains the embeddings in a Gensim-compatible format.

For more complex concepts like artists and expressions, we provide also:

  • *.emb.h Header file, that contains the ordered list of involved sub-features with the relative number of related dimensions (i.e. in the artist.emb.h, the first 2 dimensions refer to the period).

Each line of those file represents a single entity, as the parallel line in the other files. Considering musical keys as example, line 3 in the URI file identifies an entity whose label is at line 3 in label file and whose embeddings are at line 3 in vector file

Contents

name n. entities dimensions description source
key 30 100 Musical keys (e.g. C major) Key vocabulary
genre 530 80 Musical genres (e.g. symphony) 6 vocabularies: IAML, Redomi, Itema3, Musical Doc of Itema3, Diabolo, Rameau
mop 3.278 80 Medium of performance (instruments, voices, ensambles) 5 vocabularies: MIMO, IAML, Redomi, Itema3, Diabolo
function 96 100 Artist function (e.g. composer, conductor) Function vocabulary
artist 24.423 14 Composers, performers, conductors, groups SPARQL query to endpoint
expression 148.177 13 Musical works (i.e. Moonlight Sonata, Bolero, Traviata) SPARQL query to endpoint

Embedding strategy

Details will be published soon :)

As an anticipation:

  • We use node2vec [1], in the implementation of entity2vec;
  • We apply an L2 normalisation.

[1] node2vec: Scalable Feature Learning for Networks. A. Grover, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016.

About

Embeddings of musical entities

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages