Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding loader for CIPI dataset #599

Merged
merged 50 commits into from
Nov 2, 2023
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
6a2d3a3
scripts/make, index and track and dataset class. TODO tests
Oct 30, 2023
e59e8e8
fix docstring
Oct 30, 2023
ae5099a
modify the docs
Oct 30, 2023
ba71105
download disclaimer
Oct 30, 2023
a529601
black
Oct 30, 2023
720b0fe
first test
Oct 31, 2023
5655ca5
fix metadata
Oct 31, 2023
356cebc
remove embeddings
Oct 31, 2023
30ed498
add more tests
Oct 31, 2023
be73c13
black
Nov 1, 2023
b68ee68
modify tests
Nov 1, 2023
e3b2b12
modify fix.py for adding music21 (optional)
Nov 1, 2023
8174859
fix bugt with load_scores
Nov 1, 2023
3a89ca3
fix bugs
Nov 1, 2023
88cd13b
from smart_open import open
Nov 1, 2023
bfc96d0
from smart_open import open
Nov 1, 2023
de3cf84
same error than francesco
Nov 1, 2023
3848816
test fulldataset
Nov 1, 2023
6c79866
test fulldataset
Nov 1, 2023
3bec03b
test fulldataset
Nov 1, 2023
d96941c
genis suggestion
Nov 1, 2023
79daf9f
replace os.path.exists by try catch
Nov 1, 2023
bb24659
fix plobles with try catch
Nov 1, 2023
06553f1
add cipi to CUSTOM_TEST_TRACKS
Nov 1, 2023
7e04ebc
modify all the tests
Nov 1, 2023
3bee2c1
black
Nov 1, 2023
3cace66
smart open test
Nov 1, 2023
e1f0c62
black
Nov 1, 2023
eea4350
check embeddings
Nov 1, 2023
c3c0ed4
check embeddings
Nov 1, 2023
1a0ac05
check embeddings
Nov 1, 2023
a9f3404
imrpoving codecov
Nov 1, 2023
9241599
rollback haydn_op20.py
Nov 1, 2023
d8ba6d0
rollback haydn_op20.py
Nov 1, 2023
1f987a5
Merge branch 'master' into pedro/cipi
genisplaja Nov 1, 2023
bbb1a3f
Merge branch 'master' into pedro/cipi
guillemcortes Nov 2, 2023
60849f7
comentario de los embeddings
Nov 2, 2023
8d78f5c
Merge remote-tracking branch 'origin/pedro/cipi' into pedro/cipi
Nov 2, 2023
27feb27
cante100 -> cipi
Nov 2, 2023
a5e5fbe
baclk
Nov 2, 2023
bacbd2a
expressiveness
Nov 2, 2023
bc4ae49
fix make
Nov 2, 2023
97fb5f7
Done!
PRamoneda Nov 2, 2023
c6cca0b
Update cipi.py
PRamoneda Nov 2, 2023
bb6a7b8
difficulty annotation
Nov 2, 2023
9273097
fix docs table
genisplaja Nov 2, 2023
491ed2a
add dataset details and fix error message
genisplaja Nov 2, 2023
6e07a8a
now doing the fixes right :)
genisplaja Nov 2, 2023
adc5161
address problem in table.rst
genisplaja Nov 2, 2023
0dad936
Merge branch 'master' into pedro/cipi
guillemcortes Nov 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/source/mirdata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,14 @@ cante100
:inherited-members:


cante100
PRamoneda marked this conversation as resolved.
Show resolved Hide resolved
^^^^^^^^

.. automodule:: mirdata.datasets.cipi
:members:
:inherited-members:


compmusic_carnatic_rhythm
^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
9 changes: 9 additions & 0 deletions docs/source/table.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,15 @@
- 100
- :cante:`\ `

* - CIPI
- - musicXML: ❌
- embeddings: ✅
- annotations: ✅
- - difficulty levels
- 652
- image:: https://licensebuttons.net/l/by-nc-sa/4.0/80x15.png
:target: https://creativecommons.org/licenses/by-nc-sa/4.0

* - .. line-block::

(CompMusic)
Expand Down
3 changes: 1 addition & 2 deletions mirdata/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -529,13 +529,12 @@ def __init__(self, track_id, data_home, dataset_name, index, metadata):
raise ValueError(
"{} is not a valid track_id in {}".format(track_id, dataset_name)
)

self._metadata = metadata
self.track_id = track_id
self._dataset_name = dataset_name

self._data_home = data_home
self._track_paths = index["tracks"][track_id]
self._metadata = metadata

@cached_property
def _track_metadata(self):
Expand Down
259 changes: 259 additions & 0 deletions mirdata/datasets/cipi.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
"""Can I play it? (CIPI) Dataset Loader

.. admonition:: Dataset Info
:class: dropdown

The "Can I Play It?" (CIPI) dataset is a specialized collection of 652 classical piano scores, provided in a
machine-readable MusicXML format and accompanied by integer-based difficulty levels ranging from 1 to 9, as
verified by expert pianists. Developed by the Music Technology Group in Barcelona, this dataset focuses
exclusively on classical piano music, offering a rich resource for music researchers, educators, and students.

The CIPI dataset facilitates various applications such as the study of musical complexity, the selection of
appropriately leveled pieces for students, and general research in music education. The dataset, alongside
embeddings of multiple dimensions of difficulty, has been made publicly available to encourage ongoing innovation
and collaboration within the music education and research communities.
"""
import json
import logging
import os
import pdb
import pickle
from typing import Optional, TextIO, List

import smart_open
from smart_open import open

from deprecated.sphinx import deprecated

from mirdata import core, io, jams_utils, download_utils

try:
import music21
except ImportError:
logging.error(

Check warning on line 33 in mirdata/datasets/cipi.py

View check run for this annotation

Codecov / codecov/patch

mirdata/datasets/cipi.py#L32-L33

Added lines #L32 - L33 were not covered by tests
"In order to use cipi you must have music21 installed. "
"Please reinstall mirdata using `pip install 'mirdata[cipi]'"
)
raise ImportError

Check warning on line 37 in mirdata/datasets/cipi.py

View check run for this annotation

Codecov / codecov/patch

mirdata/datasets/cipi.py#L37

Added line #L37 was not covered by tests

BIBTEX = """
@article{Ramoneda2024,
author = {Pedro Ramoneda and Dasaem Jeong and Vsevolod Eremenko and Nazif Can Tamer and Marius Miron and Xavier Serra},
title = {Combining Piano Performance Dimensions for Score Difficulty Classification},
journal = {Expert Systems with Applications},
volume = {238},
pages = {121776},
year = {2024},
doi = {10.1016/j.eswa.2023.121776},
url = {https://doi.org/10.1016/j.eswa.2023.121776}
}"""

INDEXES = {
"default": "1.0",
"test": "1.0",
"1.0": core.Index(filename="cipi_index_1.0.json"),
}

LICENSE_INFO = (
"Creative Commons Attribution Non Commercial Share Alike 4.0 International."
)

DOWNLOAD_INFO = """
Unfortunately the files of the CIPI dataset are available
for download upon request. After requesting the dataset, you will receive a
link to download the dataset. You must download scores.zip, embeddings.zip and index.json
copy the files into the folder:
> cipi/
> index.json
> embeddings.zip
> scores.zip
unzip embedding.zip and scores.zip and copy the CIPI folder to {}
"""


class Track(core.Track):
"""Can I play it? (CIPI) track class

Args:
track_id (str): track id of the track

Attributes:
title (str): title of the track
book (str): book of the track
URI (str): URI of the track
composer (str): name of the author of the track
track_id (str): track id
musicxml_paths (list): path to musicxml score. If the music piece contains multiple movents the list will contain multiple paths.
difficulty annotation (str): annotated difficulty
genisplaja marked this conversation as resolved.
Show resolved Hide resolved

Cached Properties:
Fingering path (str): Path of fingering features from technique dimension computed with ArGNN fingering model. Return of two paths, embeddings of the right hand and the ones of the left hand. Use torch.load(...) for loading the embeddings.
Expressiviness path (str): Path of expressiviness features from sound dimension computed with virtuosoNet model.Use torch.load(...) for loading the embeddings.
Notes path (str): Path of note features from notation dimension. Use torch.load(...) for loading the embeddings.
PRamoneda marked this conversation as resolved.
Show resolved Hide resolved
scores (list[music21.stream.Score]): music21 scores. If the work is splited in several movements the list will contain multiple scores.
"""

def __init__(self, track_id, data_home, dataset_name, index, metadata):
super().__init__(track_id, data_home, dataset_name, index, metadata)
self._data_home = data_home

@property
def title(self) -> str:
return (
self._track_metadata["work_name"]
if "work_name" in self._track_metadata
else None
)
PRamoneda marked this conversation as resolved.
Show resolved Hide resolved

@property
def book(self) -> str:
return self._track_metadata["book"] if "book" in self._track_metadata else None

@property
def URI(self) -> str:
return self._track_metadata["URI"] if "URI" in self._track_metadata else None

@property
def composer(self) -> str:
return (
self._track_metadata["composer"]
if "composer" in self._track_metadata
else None
)

@property
def musicxml_paths(self) -> List[str]:
return (
list(self._track_metadata["path"].values())
if "path" in self._track_metadata
else []
)

@property
def difficulty_annotation(self) -> str:
return (
self._track_metadata["henle"] if "henle" in self._track_metadata else None
)

def _check_embedding(self, fpath, file_type: str) -> str:
"""
Verifies the existence of an embedding file and returns its path.

Args:
fpaths (str): The path to the embedding file.
PRamoneda marked this conversation as resolved.
Show resolved Hide resolved
file_type (str): The type of the embedding file.

Returns:
str: The path to the embedding file.

Raises:
FileNotFoundError: If the embedding file does not exist.
"""
try:
with smart_open.open(fpath):
return fpath
except FileNotFoundError:
raise FileNotFoundError(
f"{file_type} embedding {fpath} for track {self.track_id} not found. "
"Did you run .download()?"
)

@core.cached_property
def fingering(self) -> tuple:
path_rh = self.get_path("rh_fingering")
path_lh = self.get_path("lh_fingering")
PRamoneda marked this conversation as resolved.
Show resolved Hide resolved
return self._check_embedding(path_rh, "Fingering"), self._check_embedding(
path_lh, "Fingering"
)

@core.cached_property
def expressiviness(self) -> str:
path = self.get_path("expressiviness")
PRamoneda marked this conversation as resolved.
Show resolved Hide resolved
return self._check_embedding(path, "Expressiviness")

@core.cached_property
def notes(self) -> str:
path = self.get_path("notes")
PRamoneda marked this conversation as resolved.
Show resolved Hide resolved
return self._check_embedding(path, "Expressiviness")

@core.cached_property
def scores(self) -> list:
try:
scores = [load_score(path, self._data_home) for path in self.musicxml_paths]
except FileNotFoundError:
raise FileNotFoundError(
"MusicXML file {} for track {} not found. "
"Did you run .download()?".format(self.musicxml_paths, self.track_id)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this message is not well written! Because if a music xml file is not available, load_score will fail and specify which file is missing, but here, you will print the entire list... maybe you could just say something like "Missing MusicXML files. Did you run .download()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @PRamoneda sorry we missed that one. Can you take a look again?

return scores

def to_jams(self):
"""Get the track's data in jams format

Returns:
jams.JAMS: the track's data in jams format

"""
return jams_utils.jams_converter(
metadata={
"title": self.title,
"artist": self.composer,
"duration": 0.0,
"book": self.book,
"URI": self.URI,
"composer": self.composer,
"track_id": self.track_id,
"musicxml_paths": self.musicxml_paths,
"difficulty_annotation": self.difficulty_annotation,
}
)


def load_score(
fhandle: str, data_home: str = "tests/resources/mir_datasets/cipi"
) -> music21.stream.Score:
"""Load cipi score in music21 stream

Args:
fhandle (str): path to MusicXML score
data_home (str): path to cipi dataset

Returns:
music21.stream.Score: score in music21 format
"""
try:
score = music21.converter.parse(os.path.join(data_home, fhandle))
except:
raise FileNotFoundError("File {} not found.".format(fhandle))
return score


@core.docstring_inherit(core.Dataset)
class Dataset(core.Dataset):
"""
The Can I play it? (CIPI) dataset
"""

def __init__(self, data_home=None, version="default"):
super().__init__(
data_home,
version,
name="cipi",
track_class=Track,
bibtex=BIBTEX,
indexes=INDEXES,
license_info=LICENSE_INFO,
download_info=DOWNLOAD_INFO,
)

@core.cached_property
def _metadata(self):
metadata_path = os.path.join(self.data_home, "index.json")
try:
with open(metadata_path, "r") as fhandle:
metadata_index = json.load(fhandle)
except FileNotFoundError:
raise FileNotFoundError(
f"Metadata {metadata_path} not found. Did you download the files?"
)
return dict(metadata_index)
Loading
Loading