Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JAMS Testing framework #60

Closed
jonnybluesman opened this issue Jul 1, 2022 · 2 comments
Closed

JAMS Testing framework #60

jonnybluesman opened this issue Jul 1, 2022 · 2 comments
Assignees
Labels
feature A new feature to implement high-priority

Comments

@jonnybluesman
Copy link
Member

jonnybluesman commented Jul 1, 2022

Testing framework for JAMS files following the JAMification step in ChoCo and assuming the availability of gold standards (manually annotated JAMS).

Preliminary sanity checks

Q: Is the given JAMS consistent and well-formatted? This applies for both gold and ChoCo JAMS before any further step is taken.

  • The JAMS can be successfully parsed by jams either in validate mode or not.
  • Annotation times are <= than the total duration of the track / piece, if the latter information is available.
  • Observations are temporally ordered in each annotation.
  • There are no annotations duplicated.
  • All possible fields in the sandbox are known, according to our schema (good to save this in a separate file).

If all these preliminary checks are passing, then we can go ahead with the gold-vs-ChoCo JAMS validation.

Metadata

Q: How good is the metadata layer in the JAMS?

Coverage: is it exhaustive? does it cover all possible fields?

Coverage is measured according to the proportion of non-null metadata fields in the ChoCo JAMS that are found in the gold JAMS.

  • Case 1: gold has more fields (coverage is less than 1).
  • Case 2: gold has fewer fields (there is a potential annotation issue).
  • Case 3: fields are the same, regardless of their content (maximum coverage).

Accuracy: For those non-null metadata fields, how many of these are correct? Can we measure quality?

  • Option 1: perfect match after basic preprocessing.
  • Option 2: Non-perfect matches can be assessed by simple text-distance methods.

Identifiers and external links

Same as for the metadata (actually, this is a particular type of metadata): coverage and accuracy.

Chord and key annotations

Q: How good and reliable is the chord (or key) annotation in JAMS? Still, w.r.t. the original files.

Comparison is still focused on coverage and accuracy, but reported independently for times, directions, and values. In this case, coverage does not look at the order, as it measures the amount of overlapping between the observation fields (this is because an extra observation may have been inserted, which breaks the expected alignment), whereas accuracy is a 1-to-1 comparison of fields -- which are assumed to be aligned. The latter can be reported according to the unit of measure of each field: seconds and measure.beats for time and duration, text-distance for string values,

@jonnybluesman jonnybluesman added feature A new feature to implement high-priority labels Jul 1, 2022
@jonnybluesman jonnybluesman self-assigned this Jul 1, 2022
@jonnybluesman
Copy link
Member Author

First version of the sanity checks and testing scripts up. Still need to be tried on some intermediary validation JAMS and plugged into the CLI for use.

@jonnybluesman
Copy link
Member Author

First version ready for testing, which includes all the metrics above, with some tweaks, before aggregation (see the example below). The latter can be done per-partition or across the whole dataset.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature to implement high-priority
Projects
None yet
Development

No branches or pull requests

1 participant