Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add loader for candombe_beat_downbeat #553

Closed
wants to merge 19 commits into from

Conversation

jimearruti
Copy link
Contributor

@jimearruti jimearruti commented Aug 15, 2022

Candombe dataset

Description

Please include the following information at the top level docstring for the dataset's module mydataset.py:

  • Describe annotations included in the dataset
  • Indicate the size of the datasets (e.g. number files and duration, hours)
  • Mention the origin of the dataset (e.g. creator, institution)
  • Describe the type of music included in the dataset
  • Indicate any relevant papers related to the dataset
  • Include a description about how the data can be accessed and the license it uses (if applicable)

Dataset loaders checklist:

  • Create a script in scripts/, e.g. make_my_dataset_index.py, which generates an index file.
  • Run the script on the canonical version of the dataset and save the index in mirdata/indexes/ e.g. my_dataset_index.json.
  • Create a module in mirdata, e.g. mirdata/my_dataset.py
  • Create tests for your loader in tests/datasets/, e.g. test_my_dataset.py
  • Add your module to docs/source/mirdata.rst and docs/source/table.rst
  • Run tests/test_full_dataset.py on your dataset.

If your dataset is not fully downloadable there are two extra steps you should follow:

  • Contacting the mirdata organizers by opening an issue or PR so we can discuss how to proceed with the closed dataset.
  • Show that the version used to create the checksum is the "canonical" one, either by getting the version from the dataset creator, or by verifying equivalence with several other copies of the dataset.
  • Make sure someone has run pytest -s tests/test_full_dataset.py --local --dataset my_dataset once on your dataset locally and confirmed it passes

Please-do-not-edit flag

To reduce friction, we will make commits on top of contributor's pull requests by default unless they use the please-do-not-edit flag. If you don't want this to happen don't forget to add the flag when you start your pull request.

As discussed with @magdalenafuentes, here's the PR regarding the dataset for "Uruguayan candombe drumming - beat and downbeat tracking data set" by Martin Rocamora and Luis Jure is now hosted in Zenodo: https://zenodo.org/record/6533068/

This PR includes a dataloader and everything needed as described in the Contributing page.

@harshpalan
Copy link
Collaborator

Thank you for your contribution to mirdata, and sorry for the delay. Before we start the formal review, a couple of high-level comments from our side:

  • We would like you to change the loader name to something like candombe.py. We know some loaders have very long and descriptive names, but we're trying to revert that because is impractical, and the information is in the datasets table anyway.
  • Function load_beats does not have documentation (candombe_beat_downbeat.Dataset.load_beats), and that is the reason the build tests are failing. Could you please add the documentation to that function?
  • The code is not formatted according to black's code style. So if you can run this command locally: black --target-version py38 mirdata/ tests/ it would able to pass the build test. More information for same can be found here.
  • Dataset information is missing in mirdata.rst and table.rst, so please add those.
  • We updated your initial comment with the PR checklist so it’s easier for you to complete the PR.

Let us know if you have any questions!

@jimearruti jimearruti marked this pull request as draft November 23, 2022 23:58
@jimearruti
Copy link
Contributor Author

Hi there @harshpalan, and sorry for the delay.

I've made the changes you requested, and I'm currently testing the dataloader to make sure everything is working properly.

I do have one question: I see the formatting test is not passing, however, when I run black locally I see that no changes are needed. Any idea of what might be going on?

Thanks!

@mrocamora
Copy link

mrocamora commented Aug 4, 2023

Could we resume from here? Any help would be much appreciated to solve these issues and get the dataset incorporated into mirdata. Thank you!

@magdalenafuentes
Copy link
Collaborator

Hey @mrocamora, thanks for bringing this to our attention again. @harshpalan is looking into it and will come back to you soon. We suspect that the formatting issue can be related to the black version, but we're a bit unsure about the errors appearing in the other loaders. Will keep you posted

@jimearruti
Copy link
Contributor Author

Hi @magdalenafuentes and @harshpalan, I hope all is well with you. Just wanted to say I'm also over here if you need me to modify something. Thanks!

@magdalenafuentes
Copy link
Collaborator

Hey @jimearruti @mrocamora, thanks for this PR and sorry for the slow response on our side. We've recently migrated soundata to GitHub actions and updated Python and packages version, and we're looking into doing the same in mirdata in the following couple of weeks, so we're holding a bit on the other PRs for the moment to incorporate those changes first and then have the PRs tested with the updated pipeline. We'll make sure to look at your PRs as soon as we finish that up so you'll hear from us soon. Thanks for your patience!

@genisplaja
Copy link
Collaborator

hey @jimearruti, took a look at the PR. Looks good :) I tried to commit to your fork but I don't have permission. I'd would like to ask you to add some modifications to your branch to see if we can get over these problems for Py3.8 and formatting and we can move on with the loader reviewing and merging soon.

First, you'd need to run black again on your loader main file (candombe.py), to pass the formatting test.

Second, presumably the problems in python 3.8 are given the newest versions of numpy that are installed. Could you please edit the setup file to update the following dependencies as attached here:

"librosa >= 0.8.0"
 "numpy>=1.16, <=1.20"

Let's see if these minor changes help on getting the tests passed and we can move on. Let us know how that goes! :)

@jimearruti
Copy link
Contributor Author

Hi @genisplaja, thank you very much for the help!

I ran a clean installation with the changes in setup.py you proposed, but I got some incompatibility issues:

ERROR: pandas 2.0.3 has requirement numpy>=1.20.3; python_version < "3.10", but you'll have numpy 1.20.0 which is incompatible.
ERROR: numba 0.58.1 has requirement numpy<1.27,>=1.22, but you'll have numpy 1.20.0 which is incompatible.
ERROR: librosa 0.10.1 has requirement numpy!=1.22.0,!=1.22.1,!=1.22.2,>=1.20.3, but you'll have numpy 1.20.0 which is incompatible.

I will try try again with the prior version of setup.py, and if that is still not working I'll try and search for a more suitable version for numpy.

I do have some follow up questions:

  • Should we be using python 3.8 at all? We are, but reviewing the documentation I saw it is stated "You will want to install the latest versions of Python 3.6 and 3.7" , so I was wondering where the library is in terms of 3.8 support.

  • Also, in that line of thought, I was wondering whether the black version might have changed on your end when doing the checks? setup.py says "black == 22.8.0", # last version for python3.6. Is that still the case?

@genisplaja
Copy link
Collaborator

hey @jimearruti, sorry for the back and forth, and thanks for taking a look at my suggestions. First of all, just wanted to let you know that @guillemcortes is working on PR #596, which should help on going through the problems in the testing system now. We have reached the conclusion that focusing on PR #596 is the fastest way to move on with the blocked PRs, so we will be working on that this week, aiming at getting it merged by the beginning of next week ad hopefully, that will unblock many PR including yours. Will keep you posted on that, thanks again for your patience and time.

Regarding your questions:

  • Actually, our idea is to drop support to 3.6 now, and 3.7 soon. We did recommend to install the latest versions of these two because they were getting old actually, so we are indeed interested in Python 3.8 and we will be including later version fo Python in PR GitHub Actions migration #596.
  • Most likely, the formatting issue will be fixed in GitHub Actions migration #596.

@jimearruti
Copy link
Contributor Author

Great @genisplaja, thank you for all the information. I'll be over here when that PR is merged :)

@codecov
Copy link

codecov bot commented Oct 29, 2023

Codecov Report

Merging #553 (16d62ad) into master (496eb4a) will increase coverage by 0.11%.
Report is 10 commits behind head on master.
The diff coverage is 98.99%.

@@            Coverage Diff             @@
##           master     #553      +/-   ##
==========================================
+ Coverage   96.96%   97.08%   +0.11%     
==========================================
  Files          58       62       +4     
  Lines        6990     7277     +287     
==========================================
+ Hits         6778     7065     +287     
  Misses        212      212              

@guillemcortes
Copy link
Collaborator

Hi @jimearruti, thanks for the patience! PR #596 has been merged. I've updated your branch to match master and the good news is that tests are passing except for the formatting test, which should be easy to solve. Just install black >= 23.0.0 (the test is running the latest available black version) and run black mirdata/ tests/. In the meantime we will review your PR so it can be merged soon.

@guillemcortes guillemcortes added the new loader request to add a specific dataset loader label Oct 31, 2023
@jimearruti
Copy link
Contributor Author

Hi @genisplaja and @guillemcortes, hope all is well over there. I formatted the code with black and tests seem to be passing. Let me know if you need anything else on my side 😄

Copy link
Collaborator

@genisplaja genisplaja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jimearruti! Thanks for your patience and nice work :) Thanks for contributing to mirdata! I'm just requesting a few minor changes, mostly documentation stuff. The rest looks good and ready to me, tests are passing and coverage is 100%.

Comment on lines 67 to 84
class Track(core.Track):
"""candombe track class
# -- YOU CAN AUTOMATICALLY GENERATE THIS DOCSTRING BY CALLING THE SCRIPT:
# -- `scripts/print_track_docstring.py my_dataset`
# -- note that you'll first need to have a test track (see "Adding tests to your dataset" below)

Args:
track_id (str): track id of the track

Attributes:
audio_path (str): path to audio file
annotation_path (str): path to annotation file
# -- Add any of the dataset specific attributes here

Cached Properties:
annotation (EventData): a description of this annotation

"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to update the docstring here, with the attributes and cached properties of your Track class. You would need to basically include audio and beats path as attributes, then beats as cached property.

@genisplaja genisplaja marked this pull request as ready for review November 1, 2023 18:02
@genisplaja
Copy link
Collaborator

hey @jimearruti deleted some suggested changes so please stick to these visible now :) Thanks!!

@jimearruti
Copy link
Contributor Author

@genisplaja let me know if everything is ok now or if you need any more changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new loader request to add a specific dataset loader
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants