-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate ISO8601 period strings #193
Conversation
Signed-off-by: Daniel Danis <daniel.gordon.danis@protonmail.com>
@@ -94,7 +94,8 @@ | |||
"properties": { | |||
"iso8601duration": { | |||
"description": "An ISO8601 string representing age.", | |||
"type": "string" | |||
"type": "string", | |||
"pattern": "^P(?!$)(\\d+Y)?(\\d+M)?(\\d+W)?(\\d+D)?(T(?=\\d+[HMS])(\\d+H)?(\\d+M)?(\\d+S)?)?$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this fail in cases where the simplified and extended P<date>T<time>
format is used? Also should there be range-checks for MM, DD (is there a max unless its part of the spec?), hh, mm, ss?
As per Wikipedia ISO8601 durations:
Alternatively, a format for duration based on combined date and time representations may be used by agreement between the communicating parties either in the basic format PYYYYMMDDThhmmss or in the extended format P[YYYY]-[MM]-[DD]T[hh]:[mm]:[ss]. For example, the first duration shown above would be "P0003-06-04T12:30:05". However, individual date and time values cannot exceed their moduli (e.g. a value of 13 for the month or 25 for the hour would not be permissible).[39]
This would fail P0001-04-02T01:23:44
or P00010402T012344
or P0001-13-32T01:25:60
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this fail in cases where the simplified and extended PT format is used?
What is "simplified" format? The format listed as the 1st in the linked paragraph? If so, these tests check the happy path, including the "date only" (e.g. P5Y
) "time only" (e.g. PT1H
), or a combination.
I am not sure what "extended" format refers to.. 😕 Is it the P[YYYY]-[MM]-[DD]T[hh]:[mm]:[ss]
schema? If so, it would not pass.
As per Wikipedia ISO8601 durations: ... Alternatively, a format for duration based on combined date and time representations may be used by agreement between the communicating parties
I am not aware of any such agreements being in Phenopacket Schema - is this anywhere in the ISO specs? As you pointed out, this implementation does not permit the additional "basic" or "extended" formats, and only works with the "simplified" duration - the 1st format in the linked paragraph above.
Do you think this is OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it is very unlikely that anybody will use "time" here (it makes no sense), and so we could either remove the $ at the end of the regex or add something like "T?" instead of $ and not test the time part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pnrobinson I concur with people likely not having to use anything shorter than P1D
anytime soon, so the Schema should fit the needs with the most basic ISO8601 duration which is described as one of the following on the Wiki page:
PnYnMnDTnHnMnS
PnW
P<date>T<time>
The JSON Schema regex added in this PR can validate the basic duration string. These are valid duration examples:
- P1Y4M3W2DT1H23M44S
- P1Y
- P4M
- P3W
- P2D
- PT1H
- PT1H2M
These examples are, however, invalid:
- P1H - Missing
T
prior hours - PT1D -
D
is not valid time unit - P - no data
- PT - no data
- P10M4Y - years must be specified before months
- P1YT4M12H - hours must be specified before minutes
The regex should, therefore, work OK for the values we expect to see anytime soon.
I propose sticking to this regex and extending later if a need arises.
@pnrobinson @julesjacobsen does this sound OK? Can I merge this?
@ielis @julesjacobsen We did not specify anything beyond ISO 8601 in the schema. It is probably best for us to allow the time (T...) portion of the string, because we did not say that it was not allowed. |
As pointed out in #192 , phenopacket-tools does not validate the ISO8601 period strings.
This PR adds the period string validation using a JSONSchema regexp.