Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate ISO8601 period strings #193

Merged
merged 1 commit into from
Sep 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,8 @@
"properties": {
"iso8601duration": {
"description": "An ISO8601 string representing age.",
"type": "string"
"type": "string",
"pattern": "^P(?!$)(\\d+Y)?(\\d+M)?(\\d+W)?(\\d+D)?(T(?=\\d+[HMS])(\\d+H)?(\\d+M)?(\\d+S)?)?$"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this fail in cases where the simplified and extended P<date>T<time> format is used? Also should there be range-checks for MM, DD (is there a max unless its part of the spec?), hh, mm, ss?

As per Wikipedia ISO8601 durations:

Alternatively, a format for duration based on combined date and time representations may be used by agreement between the communicating parties either in the basic format PYYYYMMDDThhmmss or in the extended format P[YYYY]-[MM]-[DD]T[hh]:[mm]:[ss]. For example, the first duration shown above would be "P0003-06-04T12:30:05". However, individual date and time values cannot exceed their moduli (e.g. a value of 13 for the month or 25 for the hour would not be permissible).[39]

This would fail P0001-04-02T01:23:44 or P00010402T012344 or P0001-13-32T01:25:60

Copy link
Collaborator Author

@ielis ielis Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this fail in cases where the simplified and extended PT format is used?

What is "simplified" format? The format listed as the 1st in the linked paragraph? If so, these tests check the happy path, including the "date only" (e.g. P5Y) "time only" (e.g. PT1H), or a combination.

I am not sure what "extended" format refers to.. 😕 Is it the P[YYYY]-[MM]-[DD]T[hh]:[mm]:[ss] schema? If so, it would not pass.

As per Wikipedia ISO8601 durations: ... Alternatively, a format for duration based on combined date and time representations may be used by agreement between the communicating parties

I am not aware of any such agreements being in Phenopacket Schema - is this anywhere in the ISO specs? As you pointed out, this implementation does not permit the additional "basic" or "extended" formats, and only works with the "simplified" duration - the 1st format in the linked paragraph above.

Do you think this is OK?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it is very unlikely that anybody will use "time" here (it makes no sense), and so we could either remove the $ at the end of the regex or add something like "T?" instead of $ and not test the time part

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnrobinson I concur with people likely not having to use anything shorter than P1D anytime soon, so the Schema should fit the needs with the most basic ISO8601 duration which is described as one of the following on the Wiki page:

  • PnYnMnDTnHnMnS
  • PnW
  • P<date>T<time>

The JSON Schema regex added in this PR can validate the basic duration string. These are valid duration examples:

  • P1Y4M3W2DT1H23M44S
  • P1Y
  • P4M
  • P3W
  • P2D
  • PT1H
  • PT1H2M

These examples are, however, invalid:

  • P1H - Missing T prior hours
  • PT1D - D is not valid time unit
  • P - no data
  • PT - no data
  • P10M4Y - years must be specified before months
  • P1YT4M12H - hours must be specified before minutes

The regex should, therefore, work OK for the values we expect to see anytime soon.

I propose sticking to this regex and extending later if a need arises.

@pnrobinson @julesjacobsen does this sound OK? Can I merge this?

}
},
"required": ["iso8601duration"],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,6 @@ public void checkPhenotypicFeatureConstraints(String path, String action, String
"/phenotypicFeatures[0]/onset/gestationalAge/weeks, SET[-1], 'phenotypicFeatures[0].onset.gestationalAge.weeks' must have a minimum value of 0",
"/phenotypicFeatures[0]/onset/gestationalAge/days, SET[-1], 'phenotypicFeatures[0].onset.gestationalAge.days' must have a minimum value of 0",
"/phenotypicFeatures[1]/onset/age/iso8601duration, DELETE, 'phenotypicFeatures[1].onset.age.iso8601duration' is missing but it is required",
// TODO - add test for ensuring that the duration is in an ISO8601 pattern
"/phenotypicFeatures[2]/onset/ageRange/start, DELETE, 'phenotypicFeatures[2].onset.ageRange.start' is missing but it is required",
"/phenotypicFeatures[2]/onset/ageRange/end, DELETE, 'phenotypicFeatures[2].onset.ageRange.end' is missing but it is required",
// TODO - require end being at or after start
Expand All @@ -160,6 +159,47 @@ public void checkTimeElementConstraints(String path, String action, String expec
testErrors(runner, readBethlemPhenopacketNode(), path, action, expected, true);
}

/**
* Check that error in the ISO8601 period leads to the appropriate
* {@link org.phenopackets.phenopackettools.validator.core.ValidationLevel#ERROR}.
*/
@ParameterizedTest
@CsvSource({
// Age
"/phenotypicFeatures[1]/onset/age/iso8601duration, SET[P1H], 'phenotypicFeatures[1].onset.age.iso8601duration' does not match the regex pattern ^P(?!$)(\\d+Y)?(\\d+M)?(\\d+W)?(\\d+D)?(T(?=\\d+[HMS])(\\d+H)?(\\d+M)?(\\d+S)?)?$",
"/phenotypicFeatures[1]/onset/age/iso8601duration, SET[PT1D], 'phenotypicFeatures[1].onset.age.iso8601duration' does not match the regex pattern ^P(?!$)(\\d+Y)?(\\d+M)?(\\d+W)?(\\d+D)?(T(?=\\d+[HMS])(\\d+H)?(\\d+M)?(\\d+S)?)?$",
"/phenotypicFeatures[1]/onset/age/iso8601duration, SET[P], 'phenotypicFeatures[1].onset.age.iso8601duration' does not match the regex pattern ^P(?!$)(\\d+Y)?(\\d+M)?(\\d+W)?(\\d+D)?(T(?=\\d+[HMS])(\\d+H)?(\\d+M)?(\\d+S)?)?$",
"/phenotypicFeatures[1]/onset/age/iso8601duration, SET[PT], 'phenotypicFeatures[1].onset.age.iso8601duration' does not match the regex pattern ^P(?!$)(\\d+Y)?(\\d+M)?(\\d+W)?(\\d+D)?(T(?=\\d+[HMS])(\\d+H)?(\\d+M)?(\\d+S)?)?$",
// Wrong token order.
"/phenotypicFeatures[1]/onset/age/iso8601duration, SET[P10M4Y], 'phenotypicFeatures[1].onset.age.iso8601duration' does not match the regex pattern ^P(?!$)(\\d+Y)?(\\d+M)?(\\d+W)?(\\d+D)?(T(?=\\d+[HMS])(\\d+H)?(\\d+M)?(\\d+S)?)?$",
"/phenotypicFeatures[1]/onset/age/iso8601duration, SET[P1YT4M12H], 'phenotypicFeatures[1].onset.age.iso8601duration' does not match the regex pattern ^P(?!$)(\\d+Y)?(\\d+M)?(\\d+W)?(\\d+D)?(T(?=\\d+[HMS])(\\d+H)?(\\d+M)?(\\d+S)?)?$",
})
public void checkTimeElementConstraints_ISO8601_errors(String path, String action, String expected) {
testErrors(runner, readBethlemPhenopacketNode(), path, action, expected, true);
}

/**
* Check that correct ISO8601 period yields no
* {@link org.phenopackets.phenopackettools.validator.core.ValidationLevel#ERROR}s.
*/
@ParameterizedTest
@CsvSource({
"/phenotypicFeatures[1]/onset/age/iso8601duration, SET[P1Y4M3W2DT1H23M44S]",
"/phenotypicFeatures[1]/onset/age/iso8601duration, SET[P1Y]",
"/phenotypicFeatures[1]/onset/age/iso8601duration, SET[P4M]",
"/phenotypicFeatures[1]/onset/age/iso8601duration, SET[P3W]",
"/phenotypicFeatures[1]/onset/age/iso8601duration, SET[P2D]",
"/phenotypicFeatures[1]/onset/age/iso8601duration, SET[PT1H]",
"/phenotypicFeatures[1]/onset/age/iso8601duration, SET[PT1H2M]",
})
public void checkTimeElementConstraints_ISO8601_OK(String path, String action) {
JsonNode tampered = TAMPERER.tamper(readBethlemPhenopacketNode(), path, Action.valueOf(action));

ValidationResults results = runner.validate(tampered.toPrettyString());

assertThat(results.isValid(), equalTo(true));
}

/**
* Absence of phenotypic feature id leads to an {@link org.phenopackets.phenopackettools.validator.core.ValidationLevel#ERROR}.
*/
Expand Down Expand Up @@ -683,10 +723,6 @@ private static <T extends MessageOrBuilder> void testErrors(ValidationWorkflowRu
JsonNode tampered = TAMPERER.tamper(node, path, Action.valueOf(action));

ValidationResults results = runner.validate(tampered.toPrettyString());
// TODO - remove after the tests are completed
// results.validationResults().stream()
// .map(ValidationResult::message)
// .forEach(System.err::println);

Collection<String> tokens = Arrays.asList(errors.split("\\|"));
if (validateCount) {
Expand Down