Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-3186] schema registry json schema: array compatibility checks #20137

79 changes: 49 additions & 30 deletions src/v/pandaproxy/schema_registry/json.cc
Original file line number Diff line number Diff line change
Expand Up @@ -494,7 +494,11 @@ extract_property_and_gate_check(
return {std::nullopt, &older_it->value, &newer_it->value};
}

// helper for numeric property that fits into a double:
// helper for numeric property that fits into a double.
// if the property has a default value, it can be passed as last parameter.
// This is necessary to make {"type": "string", "minLength": 0} equivalent to
// {"type": "string"}.
// if no default value is passed, the result is given by this table
// older | newer | is_superset
// ------- | ------- | -----------
// nothing | nothing | yes
Expand All @@ -507,36 +511,51 @@ bool is_numeric_property_value_superset(
json::Value const& older,
json::Value const& newer,
std::string_view prop_name,
VPred&& value_predicate) {
auto [maybe_is_compatible, older_val_p, newer_val_p]
= extract_property_and_gate_check(older, newer, prop_name);
if (maybe_is_compatible.has_value()) {
return maybe_is_compatible.value();
VPred&& value_predicate,
std::optional<double> default_value = std::nullopt) {
// get value or default_value
auto get_value = [&](json::Value const& v) -> std::optional<double> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only fixes the bug for the numeric properties but the same bug exists for other calls to extract_property_and_gate_check. If I understand it correctly, the bug is that extract_property_and_gate_check doesn't handle the default value of the field.
For example, this test case would fail - I believe incorrectly:

{
    .reader_schema
    = R"({"type": "number", "minimum": 11, "exclusiveMinimum": false})",
    .writer_schema = R"({"type": "number", "minimum": 11})",
    .reader_is_compatible_with_writer = true,
  },

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, but extract_property_and_gate_check cannot handle this case cleanly.
"exclusiveMinimum" (or _maximum) is kind of tricky because it's a boolean with a default value in draft4 but a number without a default value from draft6.
Currently, is_numeric_superset doesn't know the schema dialect, so it cannot assign a meaningful default to the property.
We have json schema normalization on the roadmap that should fix this problem at the root, by explicitly setting all the implicit default values based on the schema dialect in use.
So if you are ok with this, i'd move this fix to a follow-up or to the json schema normalization PR

auto it = v.FindMember(
json::Value{prop_name.data(), rapidjson::SizeType(prop_name.size())});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth creating a helper from std::string_view to StringRef to highlight the lifetime and hide the cast.


if (it == v.MemberEnd()) {
return default_value;
}

// Gate on values that can't be represented with doubles.
// rapidjson can serialize a uint64_t even thought it's not a widely
// supported type, so deserializing that would trigger this. note also
// that 0.1 is a valid json literal, but does not have an exact double
// representation. this cannot be caught with this, and it would require
// some sort of decimal type
if (!it->value.IsLosslessDouble()) {
throw as_exception(invalid_schema(fmt::format(
R"(is_numeric_property_value_superset-{} not implemented for type {}. input: older: '{}', newer: '{}')",
prop_name,
it->value.GetType(),
pj{older},
pj{newer})));
}

return it->value.GetDouble();
};

auto older_value = get_value(older);
auto newer_value = get_value(newer);
if (older_value == newer_value) {
// either both not set or with the same value
return true;
}

// Gate on values that can't be represented with doubles.
// rapidjson can serialize a uint64_t even thought it's not a widely
// supported type, so deserializing that would trigger this. note also that
// 0.1 is a valid json literal, but does not have an exact double
// representation. this cannot be caught with this, and it would require
// some sort of decimal type
if (!older_val_p->IsLosslessDouble() || !newer_val_p->IsLosslessDouble()) {
// both have value but they can't be decoded as T
throw as_exception(invalid_schema(fmt::format(
R"({}-{} not implemented for types [{},{}]. input: older: '{}', newer: '{}')",
__FUNCTION__,
prop_name,
older_val_p->GetType(),
newer_val_p->GetType(),
pj{older},
pj{newer})));
if (older_value.has_value() && newer_value.has_value()) {
return std::invoke(
std::forward<VPred>(value_predicate), *older_value, *newer_value);
}

auto older_value = older_val_p->GetDouble();
auto newer_value = newer_val_p->GetDouble();
return older_value == newer_value
|| std::invoke(
std::forward<VPred>(value_predicate), older_value, newer_value);
// (relevant only if default_value is not used)
// only one is set. if older is not set then newer has a value that is more
// restrictive, so older is a superset of newer
return !older_value.has_value();
}

enum class additional_field_for { object, array };
Expand Down Expand Up @@ -616,7 +635,7 @@ bool is_additional_superset(
bool is_string_superset(json::Value const& older, json::Value const& newer) {
// note: "format" is not part of the checks
if (!is_numeric_property_value_superset(
older, newer, "minLength", std::less_equal<>{})) {
older, newer, "minLength", std::less_equal<>{}, 0)) {
// older is less strict
return false;
}
Expand Down Expand Up @@ -744,7 +763,7 @@ bool is_array_superset(json::Value const& older, json::Value const& newer) {

// size checks are common to both types
if (!is_numeric_property_value_superset(
older, newer, "minItems", std::less_equal<>{})) {
older, newer, "minItems", std::less_equal<>{}, 0)) {
return false;
}

Expand Down Expand Up @@ -1023,7 +1042,7 @@ bool is_object_required_superset(

bool is_object_superset(json::Value const& older, json::Value const& newer) {
if (!is_numeric_property_value_superset(
older, newer, "minProperties", std::less_equal<>{})) {
older, newer, "minProperties", std::less_equal<>{}, 0)) {
// newer requires less properties to be set
return false;
}
Expand Down
5 changes: 5 additions & 0 deletions src/v/pandaproxy/schema_registry/test/test_json_schema.cc
Original file line number Diff line number Diff line change
Expand Up @@ -361,6 +361,11 @@ static constexpr auto compatibility_test_cases = std::to_array<
.reader_is_compatible_with_writer = true,
},
// string checks
{
.reader_schema = R"({"type": "string", "minLength": 0})",
.writer_schema = R"({"type": "string"})",
.reader_is_compatible_with_writer = true,
},
{
.reader_schema = R"({"type": "string"})",
.writer_schema = R"({"type": "string", "minLength": 1, "maxLength": 10})",
Expand Down
Loading