Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate ISO8601 period strings #193

Merged
merged 1 commit into from
Sep 22, 2023
Merged

Conversation

ielis
Copy link
Collaborator

@ielis ielis commented Sep 7, 2023

As pointed out in #192 , phenopacket-tools does not validate the ISO8601 period strings.

This PR adds the period string validation using a JSONSchema regexp.

Signed-off-by: Daniel Danis <daniel.gordon.danis@protonmail.com>
@ielis ielis linked an issue Sep 7, 2023 that may be closed by this pull request
@@ -94,7 +94,8 @@
"properties": {
"iso8601duration": {
"description": "An ISO8601 string representing age.",
"type": "string"
"type": "string",
"pattern": "^P(?!$)(\\d+Y)?(\\d+M)?(\\d+W)?(\\d+D)?(T(?=\\d+[HMS])(\\d+H)?(\\d+M)?(\\d+S)?)?$"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this fail in cases where the simplified and extended P<date>T<time> format is used? Also should there be range-checks for MM, DD (is there a max unless its part of the spec?), hh, mm, ss?

As per Wikipedia ISO8601 durations:

Alternatively, a format for duration based on combined date and time representations may be used by agreement between the communicating parties either in the basic format PYYYYMMDDThhmmss or in the extended format P[YYYY]-[MM]-[DD]T[hh]:[mm]:[ss]. For example, the first duration shown above would be "P0003-06-04T12:30:05". However, individual date and time values cannot exceed their moduli (e.g. a value of 13 for the month or 25 for the hour would not be permissible).[39]

This would fail P0001-04-02T01:23:44 or P00010402T012344 or P0001-13-32T01:25:60

Copy link
Collaborator Author

@ielis ielis Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this fail in cases where the simplified and extended PT format is used?

What is "simplified" format? The format listed as the 1st in the linked paragraph? If so, these tests check the happy path, including the "date only" (e.g. P5Y) "time only" (e.g. PT1H), or a combination.

I am not sure what "extended" format refers to.. 😕 Is it the P[YYYY]-[MM]-[DD]T[hh]:[mm]:[ss] schema? If so, it would not pass.

As per Wikipedia ISO8601 durations: ... Alternatively, a format for duration based on combined date and time representations may be used by agreement between the communicating parties

I am not aware of any such agreements being in Phenopacket Schema - is this anywhere in the ISO specs? As you pointed out, this implementation does not permit the additional "basic" or "extended" formats, and only works with the "simplified" duration - the 1st format in the linked paragraph above.

Do you think this is OK?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it is very unlikely that anybody will use "time" here (it makes no sense), and so we could either remove the $ at the end of the regex or add something like "T?" instead of $ and not test the time part

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnrobinson I concur with people likely not having to use anything shorter than P1D anytime soon, so the Schema should fit the needs with the most basic ISO8601 duration which is described as one of the following on the Wiki page:

  • PnYnMnDTnHnMnS
  • PnW
  • P<date>T<time>

The JSON Schema regex added in this PR can validate the basic duration string. These are valid duration examples:

  • P1Y4M3W2DT1H23M44S
  • P1Y
  • P4M
  • P3W
  • P2D
  • PT1H
  • PT1H2M

These examples are, however, invalid:

  • P1H - Missing T prior hours
  • PT1D - D is not valid time unit
  • P - no data
  • PT - no data
  • P10M4Y - years must be specified before months
  • P1YT4M12H - hours must be specified before minutes

The regex should, therefore, work OK for the values we expect to see anytime soon.

I propose sticking to this regex and extending later if a need arises.

@pnrobinson @julesjacobsen does this sound OK? Can I merge this?

@pnrobinson
Copy link
Collaborator

@ielis @julesjacobsen We did not specify anything beyond ISO 8601 in the schema. It is probably best for us to allow the time (T...) portion of the string, because we did not say that it was not allowed.

@ielis ielis changed the title Validate ISO8601 Validate ISO8601 period strings Sep 22, 2023
@ielis ielis merged commit b911773 into phenopackets:develop Sep 22, 2023
@ielis ielis deleted the validate-iso8601 branch September 22, 2023 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ISO 8601 Age strings should be checked
3 participants