-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE-3186] schema registry json schema: array compatibility checks #20137
[CORE-3186] schema registry json schema: array compatibility checks #20137
Conversation
no functional changes
"type": "array" can model tuples, and in doing so in Draft4 it uses "additionalItems" the same way as "additionalProperties". After Draft4, "items" (in tuple mode) works in the same way as "additionalProperties". this commit adds an enum that it's used to parametrize the field name
ff21fa8
to
e455b96
Compare
the "uniqueItems" keyword makes most sense for array type, but it's legal also for tuples, so it's checked in the common section of is_array_superset
find if a schema is modelling an array by checking if "items" is a schema or an array of schemas. implement superset check if both are array schemas
tuple schema check is more involved: each element in newer["items"] must be compatible with the element at the same index of older["items"]. additionally, if newer["items"] has more elements than older["items"], the excess elements must be compatible with older["additionalItems"] it's a similar process as "type": "object" adds "prefixItems" to the list of uninplemented keywords: this is from drafts after Draft4, and tuple schema is implemented differntly so this checks protects against that
the check would not work for numeric properties with a default value, in the edge case where on one side the property is explicitly set with the default value, and on the other side the property is left out. for example {"type": "string", "minLength": 0} and {"type": "string"} should be considered equivalent adds a parameter to set the default value if the property is not set
e455b96
to
3c9596f
Compare
Failure is #20315 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
// get value or default_value | ||
auto get_value = [&](json::Value const& v) -> std::optional<double> { | ||
auto it = v.FindMember( | ||
json::Value{prop_name.data(), rapidjson::SizeType(prop_name.size())}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth creating a helper from std::string_view
to StringRef
to highlight the lifetime and hide the cast.
.reader_schema = R"({"type": "array", "minItems": 2, "maxItems": 10})", | ||
.writer_schema = R"({"type": "array", "maxItems": 11})", | ||
.reader_is_compatible_with_writer = false, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is it worth splitting this into 2 tests, one that changes minItems
and another that changes maxItems
to show that neither change is allowed (independently of one another)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, i'll follow up
VPred&& value_predicate, | ||
std::optional<double> default_value = std::nullopt) { | ||
// get value or default_value | ||
auto get_value = [&](json::Value const& v) -> std::optional<double> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this only fixes the bug for the numeric properties but the same bug exists for other calls to extract_property_and_gate_check
. If I understand it correctly, the bug is that extract_property_and_gate_check
doesn't handle the default value of the field.
For example, this test case would fail - I believe incorrectly:
{
.reader_schema
= R"({"type": "number", "minimum": 11, "exclusiveMinimum": false})",
.writer_schema = R"({"type": "number", "minimum": 11})",
.reader_is_compatible_with_writer = true,
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, but extract_property_and_gate_check
cannot handle this case cleanly.
"exclusiveMinimum" (or _maximum) is kind of tricky because it's a boolean with a default value in draft4 but a number without a default value from draft6.
Currently, is_numeric_superset
doesn't know the schema dialect, so it cannot assign a meaningful default to the property.
We have json schema normalization on the roadmap that should fix this problem at the root, by explicitly setting all the implicit default values based on the schema dialect in use.
So if you are ok with this, i'd move this fix to a follow-up or to the json schema normalization
PR
.reader_schema = R"({"type": "array", "uniqueItems": true})", | ||
.writer_schema = R"({"type": "array"})", | ||
.reader_is_compatible_with_writer = false, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is it worth adding a test here for the default value handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i'll followup
Implements
is_array_superset
, the function that performs compatibility checks for"type": "array"
json schemas.Note1:
"type": "array"
can express proper arrays or tuples. Tuples have a validation logic similar to"type": "object"
schemas, so the compatibility check, too, has a similar logic.Note 2: for tuples, the syntax changed between Draft 4 and later drafts. To prevent erroneous validation,
"prefixItems"
keyword is added to the list of not implemented features.Last commit fixed an edge case for
"min__"
properties. these have a default value of 0, so these two schemas{"type": "string"}
and{"type": "string", "minLength": 0 }
should be considered compatible (equivalent, in fact)Backports Required
Release Notes