Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-4181] schema registry json support validating drafts 5 6 and 7 #20817

Conversation

andijcr
Copy link
Contributor

@andijcr andijcr commented Jul 2, 2024

  • break recursion for is_superset(get_true_schema(), ...) and is_superset(..., get_false_schema()) these two cases can happens for properties that don't have a default value: for example {"type": "array"} has no "items", so it gets the default value of {}. without this early break, we get in an infinite recursion where {} is passes through is_array_superset and that in turn will call is_superset and so on

  • add support for draft5: according to the official documentation, this dialect reuses the metaschema for draft4, and so does the code

  • support for draft6 : the metaschema is adapted to draft4, because the current version of rapidjson implements only draft4

  • support for draft7 : the metaschema is adapted to draft4, same reasons

  • support for schemas uris without trailing #

Fixes https://redpandadata.atlassian.net/browse/CORE-4181

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

@andijcr
Copy link
Contributor Author

andijcr commented Jul 2, 2024

/dt

@andijcr andijcr force-pushed the feat/core-4181/schema_registry_add_support_for_draft7 branch from 3de0093 to 6d76551 Compare July 4, 2024 15:42
@andijcr andijcr marked this pull request as ready for review July 4, 2024 15:42
@andijcr andijcr requested review from BenPope and a team July 4, 2024 15:42
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jul 4, 2024

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51111#01907e92-321b-4476-b55a-e72a1d0bdc06:
pandatriage cache was not found

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51111#01907e92-3215-4276-9644-9d82eecea467:
pandatriage cache was not found

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51111#01907ec1-58de-4b86-921e-54abba990aad:
pandatriage cache was not found

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51111#01907ec1-58e5-434e-b841-157f2c2dc075:
pandatriage cache was not found

@andijcr andijcr added this to the v24.2.1-rc2 milestone Jul 5, 2024
Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great.

The commit history seems a bit funky, though.

src/v/pandaproxy/schema_registry/json.cc Show resolved Hide resolved
src/v/pandaproxy/schema_registry/json.cc Outdated Show resolved Hide resolved
src/v/pandaproxy/schema_registry/test/test_json_schema.cc Outdated Show resolved Hide resolved
src/v/pandaproxy/schema_registry/json.cc Outdated Show resolved Hide resolved
src/v/pandaproxy/schema_registry/json.cc Outdated Show resolved Hide resolved
src/v/pandaproxy/schema_registry/json.cc Outdated Show resolved Hide resolved
@andijcr andijcr force-pushed the feat/core-4181/schema_registry_add_support_for_draft7 branch from 6d76551 to d63c3d1 Compare July 5, 2024 14:00
@andijcr andijcr requested review from BenPope and a team July 5, 2024 14:06
@andijcr andijcr changed the title [CORE-4181] schema registry json support validating draft 6 and draft 7 [CORE-4181] schema registry json support validating drafts 5 6 and 7 Jul 5, 2024
@andijcr
Copy link
Contributor Author

andijcr commented Jul 5, 2024

force push: review comments, added draft5

BenPope
BenPope previously approved these changes Jul 5, 2024
@andijcr andijcr dismissed stale reviews from pgellert and BenPope via d4c39e0 July 5, 2024 15:28
@andijcr andijcr requested review from BenPope and pgellert July 5, 2024 15:32
BenPope
BenPope previously approved these changes Jul 5, 2024
pgellert
pgellert previously approved these changes Jul 5, 2024
andijcr added 8 commits July 5, 2024 18:38
eventually, with default values for properties being "true/false" or
"{}/{"not":{}}", we can form a call like
is_superset("{}", ...) or is_superset(..., "{"not":{}") that can cause
infinite recursion. this is because the type is assumed to mean every
type, and this can lead to this recursive chain is_superset ->
is_array_superset -> is_superset ...

this commit breaks the chain by recognizing these atoms and returning
early.

normalization and/or AST could offer a more robust solution, by ensuring
that implicit properties are explicit and removing extraneous fields.

The test cases in this commit would lead to infinite recursion without
the early return in is_superset.
boost test macros are not great at printing the exception.
this commit adds a manual try/catch to intercept and print exceptions,
for ease of development
will need to be visible to code in later commits
enum for the dialects supported by the implementation, meant to
symbolize the URIs of the official dialects.
this function gets a dialect and a json::Document of the candidate
schema, and will perform validation against the correct metaschema.

to compile only once the Schema object,
get_metaschema<json_schema_dialect>() will compile it lazily and save it in static const
variable.

to support validation of json without the "$schema" property,
try_validate_json_schema will try to validate it against the known
dialects, from the most recent to the oldest
this is an intermediate commit.

split parsing and validating. the error messages and functionalities
remain the same, but this way it's easier to wire in the new validating
functions and to add support for new dialects
note: the diff of this commit is somewhat chaotic

delegate schema validation to validate_json_schema.

to do so, extract the dialect, if present, from the json document, and
call validate_json_schema/try_validate_json_schema
draft5 is a cleanup of draft4 and reuses the metaschema, according to
the official documentation.
This is why the support here is implemented with the draft4 metaschema
andijcr added 3 commits July 5, 2024 18:45
adapt from_uri(uri) -> json_schema_dialect to recognize

http://json-schema.org/draft-0N/schema# and
http://json-schema.org/draft-0N/schema
as draftN dialects.

removed enum values for $schema in the metaschemas since the check for
permitted values is already performed before validation
@andijcr andijcr dismissed stale reviews from pgellert and BenPope via 161213c July 5, 2024 16:51
@andijcr andijcr force-pushed the feat/core-4181/schema_registry_add_support_for_draft7 branch from d4c39e0 to 161213c Compare July 5, 2024 16:51
@andijcr
Copy link
Contributor Author

andijcr commented Jul 5, 2024

force push merge conflict

@aanthony-rp aanthony-rp merged commit 4d978e8 into redpanda-data:dev Jul 5, 2024
3 checks passed
@andijcr andijcr deleted the feat/core-4181/schema_registry_add_support_for_draft7 branch July 5, 2024 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants