Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event processing software would like to have value-based schema dispatch #652

Closed
timbray opened this issue Sep 11, 2018 · 8 comments
Closed
Labels

Comments

@timbray
Copy link

timbray commented Sep 11, 2018

I work at AWS on event-driven software, e.g. CloudWatch Events - see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CloudWatchEventsandEventPatterns.html - and with the CNCF CloudEvents spec - see https://github.com/cloudevents/spec/blob/master/json-format.md

Events tend to be JSON texts, to come in streams, and be heterogeneous. That is to say, lots of types of events succeed each other in a stream. In general, there will be some common fields, often in a top-level "envelope" wrapper, and then some type-specific "payload" data. There is heavy use of "type" fields, for example the "type" field in CW Events and the "eventType" field in CloudEvents. All processing tends to be of the form "look at the Type field and figure out what to do".

I've found it very hard to use JSON Schema for this kind of data. Basically, I want to switch schemas based on the value of a field. The rules for something with a top-level "Type": "Foo" are different from something with "Type": "Bar".

The current "dependencies" keyword can change things based on the presence of a field, which is not what we want.

You can sort of get what you want with JSON Schema by using "oneOf", where your schema ends up looking some thing like

"additionalProperties": {
"oneOf": [
{ "$ref": "#/definitions/FooEvent" },
{ "$ref": "#/definitions/BarEvent" },
{ "$ref": "#/definitions/BazEvent" },
...
"FooEvent": {
"properties": {
"Type": { "enum" : [ "Foo" ] }
... lots of rules ...
}
"BarEvent": {
"properties": {
"Type": { "enum" : [ "Bar" ] }
... lots of rules ...
}

The problem with this is that it's really hard for a schema processor to produce good error messages. It runs through all the oneOf options and explain why each one of them can match. What we'd like is for some "Type" field to be magic so that it knows that the rest of the schema depends on the value of that field. That way, the schema would be more idiomatic, and the error messages could be super helpful: "Type 'FooEvent' lacks required field 'Timestamp'" or some such.

Is it possible I'm just missing an idiomatic, clean, obvious way to do what I want with JSON Schema? That would make me happy.

@gregsdennis
Copy link
Member

With Draft-07 you can use the if/then/else keywords to build a chain. It can get pretty deeply nested, though, if you have a lot of cases.

Something like this:

{
  "if":{
    "properties":{
      "type":{"const":"EventA"}
    }
  },
  "then":{
    "$ref": "#/definitions/EventA"
  },
  "else":{
    "if":{
      "properties":{
        "type":{"const":"EventB"}
      }
    },
    "then":{
      "$ref": "#/definitions/EventB"
    },
    "else":{
      "if":{
        "properties":{
          "type":{"const":"EventC"}
        }
      },
      "then":{
        "$ref": "#/definitions/EventC"
      },
      "else": false
    }
  }
}

(I like the const keyword over a single-valued enum.)

Not really sure if that helps your error messaging much, but it has the benefit that you don't have to completely evaluate all of the complete subschemas. You just evaluate the type property over and over until you find one that matches, and then you apply that schema.

@Relequestual
Copy link
Member

Relequestual commented Sep 12, 2018

Hey @timbray Thanks for coming by to ask! It's great to have someone from Amazon engaging.

@gregsdennis is right here. Using if / then / else is your best bet. You'll have to be using at least draft-6, as those keywords aren't in draft-4. It's best if you define the draft version you're using in your schema (using the $schema keyword), as many validators now support multiple drafts.

I'd like to actively encourage you to join our JSON Schema slack server!! Discussion link on http://json-schema.org

@timbray
Copy link
Author

timbray commented Sep 12, 2018

Thanks for the guidance! - using if/else hadn’t occurred to me.

If you already have an if/else construct, is it reasonable to wonder about having a switch/case one as well? That would be a very idiomatic fit with the very common case that a JSON document has a "Type" field whose value is an enum whose value should switch in the right schema for the event.

@handrews
Copy link
Contributor

@timbray we discussed it, but it got complicated due to all of the different ways programming languages do or don't implement fall-through. Currently no JSON Schema that takes a list of schemas depends on the order of the schemas in the list, except for the array form of items which matches the schema and instance positions. We did not want to add ordered processing- I'm glossing over a lot here, but the AJV validator experimented with a switch keyword which was deemed more complicated than we wanted given that there are alternatives.

For exaple, this idiom:

{
  "anyOf": [
    {
      "if": {...},
      "then": {...}
    },
    {
      "if": {...},
      "then": {...}
    },
    ...
  ]
}

implements a unordered switch with something resembling fall-through. The same construct with oneOf implements a mutually exclusive switch, which is the same as a nested if/then/else chain but without the deep nesting nightmare that you get into without the oneOf.

Given those options, you can fiddle around with the *Of and if keywords to implement quite a few sorts of switches.

@awwright
Copy link
Member

I wrote a little about a similar problem at https://stackoverflow.com/questions/49823500/how-to-validate-a-json-object-against-a-json-schema-based-on-objects-type-descr/49996397#49996397

And in turn, #31 is a related issue tracking improvement of error reporting, even without any use of if/then. (But then again, maybe if/then is the solution to this problem.)

This is probably a frequent enough issue it should get some sort of treatment in the spec, I think.

@handrews
Copy link
Contributor

See also #738. @timbray does this answer your question about switch statements?

@handrews
Copy link
Contributor

handrews commented May 4, 2020

It's been well over a year since the last substantive comment, I'm assuming the answer was either acceptable or the OP decided not to pursue it further.

Also, with the $vocabulary feature, additional keywords for simplified switch or value dispatch could be added as 3rd-party extensions, and if one becomes very popular we can adopt it into the spec (if we're still in draft).

Note that OpenAPI's discriminator tries to do what was asked here, but does so in a way that goes against JSON Schema's processing model (there's some magical matching to key names under definitions which is problematic). The OpenAPI folks are planning to eventually deprecate discriminator in favor of something that fits with JSON Schema better, either as part of their extension vocabulary, or by adopting a 3rd-party extension into their spec.

@btiernay
Copy link

Please see #1082 for related work in this area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants