JAMS Testing framework #60

jonnybluesman · 2022-07-01T12:51:37Z

Testing framework for JAMS files following the JAMification step in ChoCo and assuming the availability of gold standards (manually annotated JAMS).

Preliminary sanity checks

Q: Is the given JAMS consistent and well-formatted? This applies for both gold and ChoCo JAMS before any further step is taken.

The JAMS can be successfully parsed by jams either in validate mode or not.
Annotation times are <= than the total duration of the track / piece, if the latter information is available.
Observations are temporally ordered in each annotation.
There are no annotations duplicated.
All possible fields in the sandbox are known, according to our schema (good to save this in a separate file).

If all these preliminary checks are passing, then we can go ahead with the gold-vs-ChoCo JAMS validation.

Metadata

Q: How good is the metadata layer in the JAMS?

Coverage: is it exhaustive? does it cover all possible fields?

Coverage is measured according to the proportion of non-null metadata fields in the ChoCo JAMS that are found in the gold JAMS.

Case 1: gold has more fields (coverage is less than 1).
Case 2: gold has fewer fields (there is a potential annotation issue).
Case 3: fields are the same, regardless of their content (maximum coverage).

Accuracy: For those non-null metadata fields, how many of these are correct? Can we measure quality?

Option 1: perfect match after basic preprocessing.
Option 2: Non-perfect matches can be assessed by simple text-distance methods.

Identifiers and external links

Same as for the metadata (actually, this is a particular type of metadata): coverage and accuracy.

Chord and key annotations

Q: How good and reliable is the chord (or key) annotation in JAMS? Still, w.r.t. the original files.

Comparison is still focused on coverage and accuracy, but reported independently for times, directions, and values. In this case, coverage does not look at the order, as it measures the amount of overlapping between the observation fields (this is because an extra observation may have been inserted, which breaks the expected alignment), whereas accuracy is a 1-to-1 comparison of fields -- which are assumed to be aligned. The latter can be reported according to the unit of measure of each field: seconds and measure.beats for time and duration, text-distance for string values,

The text was updated successfully, but these errors were encountered:

jonnybluesman · 2022-07-12T16:26:53Z

First version of the sanity checks and testing scripts up. Still need to be tried on some intermediary validation JAMS and plugged into the CLI for use.

jonnybluesman · 2022-07-14T15:14:57Z

First version ready for testing, which includes all the metrics above, with some tweaks, before aggregation (see the example below). The latter can be done per-partition or across the whole dataset.

jonnybluesman added feature A new feature to implement high-priority labels Jul 1, 2022

jonnybluesman self-assigned this Jul 1, 2022

jonnybluesman closed this as completed Jul 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JAMS Testing framework #60

JAMS Testing framework #60

jonnybluesman commented Jul 1, 2022 •

edited

Loading

jonnybluesman commented Jul 12, 2022

jonnybluesman commented Jul 14, 2022

JAMS Testing framework #60

JAMS Testing framework #60

Comments

jonnybluesman commented Jul 1, 2022 • edited Loading

Preliminary sanity checks

Metadata

Coverage: is it exhaustive? does it cover all possible fields?

Accuracy: For those non-null metadata fields, how many of these are correct? Can we measure quality?

Identifiers and external links

Chord and key annotations

jonnybluesman commented Jul 12, 2022

jonnybluesman commented Jul 14, 2022

jonnybluesman commented Jul 1, 2022 •

edited

Loading