-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE-3187] schema registry - json schema: support allOf oneOf anyOf
#21354
[CORE-3187] schema registry - json schema: support allOf oneOf anyOf
#21354
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool
// size are compatible, now we need to check that every schema from | ||
// the smaller schema array has a unique compatible schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "smaller schema array" correct in this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
for allOf, newer can have more subschemas than older (each new subschema is more rules)
for oneOf/anyOf, newer can have less subschemas than older (any subschema removed reduces the degrees of freedom)
the algo tries to do a match and we are concerned to ensure that the smaller subschema array is covered
/dt |
new failures in https://buildkite.com/redpanda/redpanda/builds/51953#0190e3da-bbed-46bf-b8b8-fbb4e2326391:
new failures in https://buildkite.com/redpanda/redpanda/builds/51979#0190e58b-183e-4acd-94ab-f2f96a4f3122:
new failures in https://buildkite.com/redpanda/redpanda/builds/51979#0190e5a4-8a7d-4b2a-b9c6-9591869ea282:
|
d6cdf66
to
bb90bd6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
a valid schema should be compatible with itself, check this in test_compatibility_check
…ions allow a test case to throw an exception as failure mode
from draft6, true/false are valid schemas and have to be treated as {}/{"not":{}} this commit: - changes the return value for get_true_schema/get_false_schema to ConstObject, needed to uniform to get_object_or_empty, - adds get_schema, helper that translates missing -> {} boolean -> {}/{"not":{}} object -> object - modify is_superset to translate its inputs with get_schema - modify get_object_or_empty to translate its input with get_schema the change allows deleting the helpers for additionalProperties/additionalItems
increase the threshold to accept {"type": "number", "multipleOf": 1.1} {"type": "number", "multipleOf": 3.3} std::abs(std::reminder(3.3, 1.1)) is ~5e-16 this is a fairly small value, but extremely bigger than the previous 10*epsilon limit the new threshold depends on the Unit of Least Precision (ULP) relative to the bigger value. This value is the distance to the next floating point value, and RapidJson documents that it's string->double parsing code has at most a 3*ulp error. For this reason, this is the threshold used to check if the reminder is close enough to zero
this check is meant to cover anyOf/oneOf/allOf. base function that just check that no combinator is used by both schemas note: json schema allows more than one combinator to appear in a schema, but this it's difficult to define how is_superset should behave in this case. since this seems to be an edge case, this implementation will simply throw an exception. the schemas themself can still be used but it's not possible to perform compatibility checks on them
given the current implementation, a combinator can have 4 states not set, allOf, oneOf, anyOf. in case of state mismatch between schema, this commit throws an exception as not implemented (that gets converted to false) this is overly restrictive: some combinations, like (not set, anyOf), or (anyOf, not set) or cases where the schema array has only one value, might be compatible. this will be revisited in a later commit
check that if both schema use the same combinator, then the schema arrays have a compatible size. for allOf, newer can have more schemas, because this can be seen as more restritive rules for oneOf/anyOf, newer can have less schemas, because this reduces the json that can be valid
…hing once the function has two schema arrays of suitable cardinality, we need to check that every schema of the smaler set has a distinct match in the other set. this is solved in a general way by constructing a bipartite graph, and computing the maximum cardinality matching on the graphs. an example: older= {"anyOf": [A, B, C, D]} newer= {"anyOf": [1, 2, 3]} for each pair o ∈ [A, B, C, D] n ∈ [1, 2, 3] if (is_superset(o, n)) then we add a edge between o and n so we find out that the list of edges is [(A, 2), (A, 3), (B,1), (C, 2)], one maximum cardinality matching is [(A, 3), (B, 1), (C, 2)]. every subschema from newer (them smaller set) has a distinct companion in older, meaning that we are sure that is_superset(older, newer) every json matched by one of the subschemas of newer will be matched by a subschema of older
testing shows that compatibility checks don't need to take in consideration the "if"/"then"/"else" constructs
bb90bd6
to
c5cbb15
Compare
force push to fix merge conflicts |
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52024#0190e97b-2152-45cf-af10-d183dba668a1 |
/backport v24.2.x |
Failed to create a backport PR to v24.2.x branch. I tried:
|
/backport v24.2.x |
throw as_exception(invalid_schema(fmt::format( | ||
"{} not implemented for different combinators. input: older: '{}', " | ||
"newer: '{}'", | ||
__FUNCTION__, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use fmt_with_context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't get the function name with that if i'm not mistaken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeh but you get line number, so that's good enough?
Add preliminary support to perform compatibility checks on
allOf oneOf anyOf
combinatorsthis first support requires that
These requirements are stricter than necessary and will be relaxed in a future commit.
Additional work in this PR:
is_superset({"type": "number", "multipleOf": 1.1}, {"type": "number", "multipleOf": 3.3})
, the remainder of3.3/1.1
is compared against 3*ulp(newer)true
to alias{}
andfalse
to alias{"not":{}}
(from draft6 and beyond)Fixes https://redpandadata.atlassian.net/browse/CORE-432
Backports Required
Release Notes