Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add propertyDependencies keyword (aka discriminator) #1082

Closed
jdesrosiers opened this issue Mar 23, 2021 · 38 comments · Fixed by #1143
Closed

Proposal: Add propertyDependencies keyword (aka discriminator) #1082

jdesrosiers opened this issue Mar 23, 2021 · 38 comments · Fixed by #1143
Milestone

Comments

@jdesrosiers
Copy link
Member

The pattern of using oneOf to describe a choice between two types has become ubiquitous.

{
  "oneOf": [
    { "$ref": "#/$defs/aaa" },
    { "$ref": "#/$defs/bbb" }
  ]
}

However, this pattern is also notorious for its many shortcomings. There is a better way to describe this kind of constraint that doesn't have all the problems of the oneOf pattern, but it's verbose, error prone, and not particularly intuitive.

{
  "allOf": [
    {
      "if": {
        "properties": {
          "foo": { "const": "aaa" }
        },
        "required": ["foo"]
      },
      "then": { "$ref": "#/$defs/foo-aaa" }
    },
    {
      "if": {
        "properties": {
          "foo": { "const": "bbb" }
        },
        "required": ["foo"]
      },
      "then": { "$ref": "#/$defs/foo-bbb" }
    }
  ]
}

OpenAPI addresses this problem with the discriminator keyword. However their approach is more oriented toward code generation concerns and is poorly specified when it comes to validation. I don't think we should adopt discriminator, but I do think we need something like it. I believe this is the thing that is generating the most questions in our community right now.

Right now, we have the dependentSchemas keyword that is very close to what is needed except it checks for the presence of a property rather than it's value. The propertyDependencies keyword builds on that concept to solve the problem.

{
  "propertyDependencies": {
    "foo": {
      "aaa": { "$ref": "#/$defs/foo-aaa" },
      "bbb": { "$ref": "#/$defs/foo-bbb" }
    }
  }
}

If the instance is an object, then for every property (name) and value of that property (value), if /propertyDependencies/{name}/{value} is defined, then the instance must be valid against the schema at that location. Compared to discriminator, this is more consistent with the style of JSON Schema keywords because it doesn't use sub-keywords like propertyName and mappings. It's also more powerful because it allows you to discriminate using more than one property.

Because of the parallels to dependencies, I chose to name it in similar way. However, dependencies is the least well known and understood keyword, so it might not be beneficial to build on that naming convention. Either way, I don't think we should call it discriminator to avoid confusion with what OpenAPI has specified.

@karenetheridge
Copy link
Member

karenetheridge commented Mar 23, 2021

Even though this is "just" syntax sugar, the dependentSchemas and dependentRequired keywords are sugar too, that can also be achieved by combining if/then/else and allOf. However, sugar keywords are more efficient (in the allOf case we evaluate all clauses, even though only one of the if/then conditions can successfully match), and also the resulting errors will be more clear to the user. (And of course the paths in the errors or annotations will be different.)

I agree that this is one of the most frequently asked questions lately, and adding this construct should help both schema authors and various tools that parse schemas and generate code and/or UIs.

(Note: I originally misread the placement of required in the examples above. required is needed in the conditional, so the check doesn't pass if the property doesn't exist at all. but it should not be implied at the top level, as none of the other object keywords (properties, patternProperties, propertyNames, dependent*) imply required either. If the property doesn't exist, then the subschemas are not applied and the keyword as a whole still evaluates successfully.)

@gregsdennis
Copy link
Member

{
  "propertyDependencies": {
    "foo": {
      "aaa": { "$ref": "#/$defs/foo-aaa" },
      "bbb": { "$ref": "#/$defs/foo-bbb" }
    }
  }
}

If the instance is an object, then for every property (name) and value of that property (value), if /propertyDependencies/{name}/{value} is defined, then the instance must be valid against the schema at that location.

I'm confused. Are aaa and bbb possible values for foo? What if my possible value for foo is a non-string? I won't be able to use that as a key in an object as shown here.

@karenetheridge
Copy link
Member

discriminator only works for objects whose values are strings. Is there a need to support anything else? What might the syntax look like for that?

@gregsdennis
Copy link
Member

There could be. If this is intended to replace the allOf-if verboseness above, then I would expect that any value should be supported.

@jdesrosiers
Copy link
Member Author

jdesrosiers commented Mar 24, 2021

@gregsdennis It's important to keep in mind that propertyDependencies is not designed to solve every possible conditional selection use case. dependentSchemas and propertyDependencies are just sugar for the most common patterns. Neither are intended to replace or to be as powerful as if/then.

That said, you have a good point that this would only work for properties with string values and it would be nice if it could support numbers and booleans without compromising the simplicity propertyDependencies achieves. We could define some type coercion behavior for numbers and booleans if we think it's valuable enough to justify the extra complexity. I'm not sure it's worth it.

Another limitation that came up in Slack discussion is that there's no way to set a default schema to use if there are no matches. This could be supported by adding a sibling keyword called defaultPropertyDependency. Again, I don't think it's a common need and therefore, not worth including at this point.

@gregsdennis
Copy link
Member

Maybe with these things in mind, this would be better suited to an independent vocabulary and not in the out-of-the-box (avoiding the word "core" here) set of keywords.

(🤔 Actually, also moving other "sugar" keywords into a separate vocabulary might not be a bad idea...)

@jdesrosiers
Copy link
Member Author

I disagree. We want JSON Schema to be reasonably easy use for the most common use cases without having to be extended. That means that having a few sugar keywords around to make very common patterns not such a horrible experience. The use case propertyDependencies covers is extremely common and is responsible for a significant chunk of the support questions we see on SO and Slack. That's why I was motivated to make this proposal. I'm tired of answering the same questions over and over again.

Moving sugar keywords out to an independent vocabulary is probably a more slippery slope than you expect. Here are some examples that could be considered sugar keywords.

  • dependentRequired
  • dependentSchemas
  • if/then/else
  • contains
  • enum
  • The array form of type
  • properties

@gregsdennis
Copy link
Member

gregsdennis commented Mar 24, 2021

I'm not suggesting the existing sugar keywords are moved to an independent vocab, just a spec-defined sugar vocab, akin to the annotations or metadata vocabs. If they just make authoring easier, then they're not really basic functionality. It was just a thought.

But I do think this specific proposal belongs in an independent vocab unless we can make it work for all value types.

@karenetheridge
Copy link
Member

karenetheridge commented Mar 24, 2021

Whether this goes into a new vocab or not is more an implementation question, as long as it's included in the main meta-schema (that is, it is enabled for schemas that declare "$schema": "https://json-schema/draft/2021-XX/schema"). It's not a difficult keyword to implement (certainly easier than unimplemented*), and omitting it from the default set would defeat the point of making it available to to make schema authoring more intuitive.

@jdesrosiers
Copy link
Member Author

@gregsdennis

I'm not suggesting the existing sugar keywords are moved to an independent vocab, just a spec-defined sugar vocab

I see. I'm not sure that's necessary, but it's a little off topic, so let's discuss that later.

I do think this specific proposal belongs in an independent vocab unless we can make it work for all value types.

It seems a bit extreme to exclude a keyword that solves a real and common problem because it doesn't support a use case that no one is asking for. I think it would help if you could share why you think it's important for this keyword to support all value types.

I'll write up some examples of alternatives so we can more easily see what the trade-offs are. I think that might help with decision making process.

@gregsdennis
Copy link
Member

$data solved a "real and common problem" but I put my proposal in a vocab so that it could gain adoption and support independently.

I think this should be the approach for new keywords going forward. If we see that it has a fair amount of implementation support and end-user use, then it can be pulled into the main vocabs with relative ease.

I also think that keywords that are in the main vocabs should be as generic as possible, supporting all possible values. Examples of this are enum and format, which allow all types, even objects and arrays.

You might argue that keywords like required and properties only support strings, but these target object keys specifically which are only ever strings. However this keyword (as proposed) targets values, which can be of any type.

@jdesrosiers
Copy link
Member Author

Here are what some alternatives might look like that support all value types. All examples are functionality equivalent to the examples in the original post.

This version uses an array of objects. Each object is a collection of the variables needed to express a property dependency. This doesn't fit the style of JSON Schema. There aren't any keywords remotely like this. It's also still too verbose. It's a little more intuitive than if/then and definitely less error prone. This is also less efficient to process. Implementations will need to check every alternative for a match as opposed to the original proposal that can do this in O(1).

{
  "propertyDependencies": [
    {
      "propertyName": "foo",
      "propertySchema": { "const": "aaa" },
      "apply": { "$ref": "#/$defs/foo-aaa" }
    },
    {
      "propertyName": "foo",
      "propertySchema": { "const": "bbb" },
      "apply": { "$ref": "#/$defs/foo-bbb" }
    }
  ]
}

A slight variation on that example is to make it a map of keyword to dependency object. But, I don't think that makes a significant difference. It's still too verbose and has the same efficiency problems in most cases.

{
  "propertyDependencies": {
    "foo": [
      {
        "propertySchema": { "const": "aaa" },
        "apply": { "$ref": "#/$defs/foo-aaa" }
      },
      {
        "propertySchema": { "const": "bbb" },
        "apply": { "$ref": "#/$defs/foo-bbb" }
      }
    ]
  }
}

This one is a little more consistent with the JSON Schema style (poor keyword naming aside), but otherwise has all the same problems as the other examples.

{
  "allOf": [
    {
      "propertyDependencyName": "foo",
      "propertyDependencySchema": { "const": "aaa" },
      "propertyDependencyApply": { "$ref": "#/$defs/foo-aaa" }
    },
    {
      "propertyDependencyName": "foo",
      "propertyDependencySchema": { "const": "bbb" },
      "propertyDependencyApply": { "$ref": "#/$defs/foo-bbb" }
    }
  ]
}

This one is a variation of if that combines if, properties, and required to reduce boilerplate. It's also essentially a variation of the previous example with better names. While I think this is the best alternative, I still think people will avoid this because it's too verbose.

{
  "allOf": [
    {
      "ifProperties": {
        "foo": { "const": "aaa" }
      },
      "then": { "$ref": "#/$defs/foo-aaa" }
    },
    {
      "ifProperties": {
        "foo": { "const": "bbb" }
      },
      "then": { "$ref": "#/$defs/foo-aaa" }
    }
]

I read all the discriminator discussions I could find in the OpenAPI repo before making this proposal and I will tell you that one of the main things people want from this keyword that you don't get from any of these alternatives is O(1) alternative selection. Only the original proposal can do that.

@jdesrosiers
Copy link
Member Author

$data solved a "real and common problem"

It most certainly does, but $data is a very different case. There are arguments about scope (which I don't completely agree with), but more importantly, it introduces a completely new core behavior to JSON Schema which can cause problems. For example, the way your vocabulary defines $data is not compatible with validators that have a separate compile and interpret step like mine and many others. propertyDependencies is effectively just sugar for something JSON Schema already supports. There is plenty of precedent for adding this kind of thing when we see overwhelming need for them including contains and if/then/else.

I put my proposal in a vocab so that it could gain adoption and support independently. I think this should be the approach for new keywords going forward.

I don't think our vocabulary system is equipped to support that goal right now. It's something I want to bring up in more detail at some point, but I'll try to explain briefly for now. Vocabularies are well equipped for creating dialects. It's good for when someone wants to include a version of JSON Schema in their own specification and provide their own tooling for that specialized use of JSON Schema. OpenAPI is a great example of who it would work well for. However, vocabularies are not good for individual schema authors who want to add a new capability to their schemas. Schema authors would need to create their own custom dialect every time they want to use a non-standard keyword in one of their schemas. I don't see people doing that until we can make changes to the vocabulary system to better support schema authors as opposed to specification authors.

I also think that keywords that are in the main vocabs should be as generic as possible

I agree that they should be as generic as is reasonable. When making it more generic means that it no longer solves the problem it's intended to solve, then you've gone too far. Hopefully the alternative examples I posted help illustrate that supporting any value type is a step too far.

@handrews
Copy link
Contributor

I'd advise separating the question of extension vs in-spec vocabulary from the discussion of the keyword itself. These are largely orthogonal problems and can clutter the issue for readers not as deeply versed in the project. Also, keywords can always be pulled into the spec- initially releasing it as an extension vocabulary for use with 2020-12 does not preclude pulling it into the next draft (or a later one).

While I have a knee-jerk "what about other types" reaction, I agree that a keyword that is fundamentally about making a common use case easier should focus on that use case. It's already possible to handle arbitrarily complex conditions, so we don't really need another system for arbitrary complexity.

I think boolean and numeric coercion is a reasonable approach for those simple cases, but don't have strong feelings on it. You would technically lose the ability to distinguish between true and "true", or null and "null", or 1 and "1", but arguably if you're trying to do that you should be questioning your life choices 😜 You could also do that distinction with oneOf or if and type if you really need it- a good example of making the simple case (switch on true / false / null) easy with type coercion, and leaving the complex-to-pathological case (type mixing) as requiring more complexity.

Since I haven't been commenting for a while, let me just note that I'll probably start dropping the occasional comment on issues, but do not intend to dive back in at anything resembling my previous levels of involvement anytime soon. Please just treat my comments as offered opinions, and don't wait on me if I don't reply to something promptly. I'm thrilled to see all of the work that is going on and do not want to disrupt the flow you all have going!

@jdesrosiers
Copy link
Member Author

It's good to hear from you Henry.

It's already possible to handle arbitrarily complex conditions, so we don't really need another system for arbitrary complexity.

This is what I've been trying to say, but I think you said it more succinctly. Thank you.

@jdesrosiers
Copy link
Member Author

jdesrosiers commented Jun 17, 2021

I implemented propertyDependencies in my draft-future dialect. If you want to try it out, you can go to https://json-schema.hyperjump.io and use $schema": "https://json-schema.org/draft/future/schema".

Note: Things in my draft-future dialect are not necessarily going to appear in the next draft and this dialect is not available in npm. It should only be used for the purposes of evaluating potential changes.

@gregsdennis
Copy link
Member

I still don't like that this only supports cases where values are strings. I think that one aspect makes this too specialized to include in JSON Schema proper.

@jdesrosiers
Copy link
Member Author

@gregsdennis What would it take to convince you?

We have the generalized solution in if/then, so we aren't losing anything by propertyDependencies being specialized and I think I've effectively shown that any attempt to make it more generalized doesn't solve the problem. So, I'm guessing that your reservation is that it's too specialized to a point that it will rarely be used. If I could somehow come up with numbers to show that propertyDependencies covers an overwhelming percentage of uses of if/then and oneOf, would that convince you? I personally think those percentages are around 95% for if/then and 99% for oneOf. If those numbers are accurate, would you agree that this keyword is worth having in JSON Schema? If that metric is convincing at all, what numbers would you consider the threshold for considering including propertyDependencies?

@NiklasBeierl
Copy link

NiklasBeierl commented Aug 8, 2021

The way I understand this it's a hard choice between getting O(1) validation vs. supporting values other than strings. The only way to get O(1) choice of a schema is with objects and well, the names have to be strings. Doing type-coercion here seems like a huge nono to me, that stuff is a big part of why I develop Python instead of JS these days.

I'm quite surprised that people are so concerned with getting this validation with O(1) vs. O(n) as described in @jdesrosiers post here, but he certainly knows better than me.

On the other Hand I really understand @handrews and @gregsdennis aversion to only support string values. I for one have at least one case in mind where I use integers to discern the "type" of object.
But since we are already in syntactic-sugar-land I really wonder whether "just doing both" is an option:

Couldn't the logic from the original proposal be applied if /propertyDependencies/{name}/ is an object,
and the logic from the second alternative described here if it's is an array?

@jdesrosiers
Copy link
Member Author

@NiklasBeierl Thank you for your feedback! I really want to hear more voices on this issue.

The way I understand this it's a hard choice between getting O(1) validation vs. supporting values other than strings.

O(1) is not the property I'm trying to optimize for. We already have a solution that supports all values: if/then. The problem is that almost no one uses it when they should and when they do it's easy to do wrong. So, the goal is to have something that people want to use and isn't prone to errors. I completely agree that O(1) is not particularly important when we are taking about relatively small values of "n", but that was the number one reason people cite for not wanting to use if/then over discriminator. Personally, I think the real reason if/then isn't used more when it should is that it's verbose, complicated, and error prone. To me, that's what's important and I think it's clear that only the original proposal ticks all of those boxes even if you ignore the people saying they want O(1).

Couldn't the logic from the original proposal be applied if /propertyDependencies/{name}/ is an object,
and the logic from the second alternative described here if it's is an array?

I'm certainly open to a hybrid solution and/or other alternatives, but I don't think alternative 2 is a good solution. I think it's too verbose. But, mostly it's going to be a hard sell because it requires a keyword with sub-keywords, which is something that has been aggressively avoided in JSON Schema in the past.

I personally think that the use case for values other than strings is rare enough that it's ok to have people use if/then for those cases instead of bike shedding propertyDependencies to solve every possible case. dependentSchemas doesn't solve every possible case either and that's ok.

@letmaik
Copy link

letmaik commented Dec 19, 2021

Please consider including defaultPropertyDependency or something else that allows to specify a closed set of property values. Otherwise, all possible options have to be repeated yet again (totalling three times, compared to one time with discriminator):

{
    "type": "object",
    "properties":
    {
        "type": 
        {
            "enum":
            [
                "A",
                "B",
                "C",
                "D"
            ]
        }
    },
    "required": ["type"],
    "propertyDependencies":
    {
        "type":
        {
            "A": { "$ref" : "..." },
            "B": { "$ref" : "..." },
            "C": { "$ref" : "..." },
            "D": { "$ref" : "..." }
        }
    }
}

vs.

{
    "type": "object",
    "required": ["type"],
    "propertyDependencies":
    {
        "type":
        {
            "A": { "$ref" : "..." },
            "B": { "$ref" : "..." },
            "C": { "$ref" : "..." },
            "D": { "$ref" : "..." }
        }
    },
    "defaultPropertyDependency":
    {
        "type": false
    }
}

I'm assuming defaultPropertyDependency would only be applied if the property exists, hence "required": ["type"] is needed to achieve the same as in the top example.

In general, having to repeat the discriminator values in propertyDependencies is unfortunate since it requires more testing to make sure everything is in sync with the subschemas, but I guess there's no way around it without violating JSON Schema's property of independent schema evaluation.

@gregsdennis
Copy link
Member

gregsdennis commented Dec 20, 2021

I think I see the usefulness of this, but my concern about non-string values remains.

That said, I think if we were to change the name from propertyDependencies to something more descriptive of the case it does address I'd be happier. To me propertyDependencies is very generic, and so I expect it to cover most if not all scenarios. I also understand and agree with the hesitation to reuse "discriminator."

Perhaps something like "selector," following the "discriminator' example. Or "switch" following in the "if/then/else" theme ("select" works here, too, since that's the word that's used in some languages for the same construct).

You'd end up with

{
  "switch": {
    "foo": {
      "aaa": { "$ref": "#/$defs/foo-aaa" },
      "bbb": { "$ref": "#/$defs/foo-bbb" }
    }
  }
}

which to me actually reads very similar to

switch (foo) {
    case "aaa":
        return Validate("#/$defs/foo-aaa");
    case "bbb":
        return Validate("#/$defs/foo-bbb");
    default: // with @letmaik's suggestion 👍
        return false;
}

With this, it's obvious that we're switching out the schema we're using to validate the object based on the value in one of its properties.

One thing that stands out, though is a limitation on using a property at the (local) instance root. What if the discriminating property is nested inside the instance, for example, in some meta-data container property? Could we use pointers (not URIs) to indicate the instance path?

{
  "switch": {
    "/meta/foo": {
      "aaa": { "$ref": "#/$defs/foo-aaa" },
      "bbb": { "$ref": "#/$defs/foo-bbb" }
    }
  }
}

@jdesrosiers
Copy link
Member Author

@letmaik Thanks for calling this out. It's been really helpful to think through some of the use cases.

Otherwise, all possible options have to be repeated yet again (totalling three times, compared to one time with discriminator):

I understand the case you're calling out. If you need to define a closed set of dependencies, the property values would have to be kept in sync between the propertyDependencies keyword and the enum keyword. But, I don't see where the third repetition you mention.

I see how discriminator can do it without repetition if you don't use the mappings sub-keyword, but that requires supporting something that isn't allowed in JSON Schema.

I'm assuming defaultPropertyDependency would only be applied if the property exists

Actually, I was thinking defaultPropertyDependency would be a single JSON Schema rather than a property map of JSON Schemas. It would apply if none of the mappings match, which would include if the property doesn't exist. Not sure if that makes more sense or your version. Honestly, I haven't thought through what defaultPropertyDependency might look like because until your comment, I didn't have a good use-case for it.

@jdesrosiers
Copy link
Member Author

@gregsdennis

I think if we were to change the name from propertyDependencies to something more descriptive of the case it does address I'd be happier.

I'm more than happy to come up with a better name. I chose the name because it works exactly like dependencies except for property values rather than property presence. However, dependencies was never a good or descriptive name to begin with, so I'm happy to consider alternatives.

Perhaps something like "selector," following the "discriminator' example.

I don't follow this suggestion. "selector" is not a term used with discriminator.

Or "switch" following in the "if/then/else" theme

My concern with "switch" is that switch statements have fall-through semantics, which is why you need to put break statements at the end of every case. So, it's not really equivalent to a switch and using that terminology could be confusing.

With this, it's obvious that we're switching out the schema we're using to validate the object based on the value in one of its properties.

I disagree that switching schemas is a good description of what this is doing or even what it should be doing. Switching, to me, sounds like something is being swapped out for something else or a path is being chosen from a set of alternatives. Nothing is being replaced and these aren't exactly alternatives. Schemas are applied if they match a condition. Sometimes multiple schemas will apply and other times none will apply.

What if the discriminating property is nested inside the instance, for example, in some meta-data container property? Could we use pointers (not URIs) to indicate the instance path?

We could do that. It slightly complicates the normal case, but only slightly. However, I think I'd prefer to just have people fall back to if/then in exceptional cases like this. I don't want to bike shed this keyword too much. It doesn't have to do everything.

@letmaik
Copy link

letmaik commented Dec 22, 2021

Otherwise, all possible options have to be repeated yet again (totalling three times, compared to one time with discriminator):

[...] But, I don't see where the third repetition you mention.

I think I just phrased that in a misleading way. It's not three repetitions, but three occurences. The one I haven't explicitly written down would be in the subschemas. For example, the schema referenced in "A": { "$ref" : "..." } may have:

{
  "properties": {
    "type": { "const": "A" },
    ...
  }

If you were able to use oneOf and discriminator then the only mention of the concrete "type" values ("A" above) would be in the subschemas. I understand though that with propertyDependencies you could remove the "type" values from the subschemas, therefore not ending up at three but two (or ideally one, with the default schema method) occurrences. It just makes the subschema (which probably lives in its own file) less useful on its own and less self-describing.

I'm assuming defaultPropertyDependency would only be applied if the property exists

Actually, I was thinking defaultPropertyDependency would be a single JSON Schema rather than a property map of JSON Schemas.

Interesting, I guess that would work and probably be what is needed in most cases. I'm all for making it less verbose.

@gregsdennis
Copy link
Member

gregsdennis commented Dec 22, 2021

Sometimes multiple schemas will apply and other times none will apply.

How could multiple apply? The options are the value of a property, and they're explicitly stated. A foo property cannot have both a aaa value and a bbb value. You could have multiple properties apply, but that translates into multiple "switch" statements. This keyword acts as a collection of switch statements, each one switching on the values of a property.

If you're uncomfortable with switch then perhaps select or selector (disregard my comment relating it to discriminator)?

I disagree that switching schemas is a good description of what this is doing or even what it should be doing. Switching, to me, sounds like something is being swapped out for something else or a path is being chosen from a set of alternatives.

To me that's exactly what's going on. You're making a selection of which subschema to apply to an object based on one of its property values.

@letmaik
Copy link

letmaik commented Dec 22, 2021

Random idea: valueSchemas

@letmaik
Copy link

letmaik commented Dec 22, 2021

Random idea: valueSchemas

Nah, I'll take it back. This may imply you want to define a schema for the actual value, not that something should happen based on a value.

@letmaik
Copy link

letmaik commented Dec 22, 2021

Haskell has case expressions and they look like this:

case type of 
  "A" ->  ...
  "B" ->  ...
  "C" ->  ...

Maybe that's another candidate:

"case": {
  "type": {
    "A": { "$ref": ... },
    "B": { "$ref": ... },
    "C": { "$ref": ... }
  }
}

@jdesrosiers
Copy link
Member Author

I don't see the third repetition you mention.

The one I haven't explicitly written down would be in the subschemas.

I see what you mean. As you say, the one in the sub-schemas isn't necessary. If you want to include it, then, yes, that would be another occurrence.

Sometimes multiple schemas will apply and other times none will apply.

How could multiple apply? [...] You could have multiple properties apply

Yes, you could have multiple properties apply. That's all I meant.

If you're uncomfortable with switch then perhaps select or selector

I still don't think that selecting is the right metaphor, but it's not worse than propertyDependencies, so let's keep it on the table. However, I'd probably pluralize it since, as you pointed out, the keyword represents multiple "selections".

Random idea: valueSchemas

I had considered propertyValueDependencies, but I thought it was too long and shortened. valueDependencies is maybe a more descriptive shortening. Anyway, something with "value" in it would make sense.

Haskell has case expressions [...] Maybe that's another candidate

That's a possibility. Or, something with the word "case" in it. This type of feature is generally referred to as pattern matching, so I had considered something with those words as well, but "pattern" isn't quite right here. Maybe something like valueMatchers?

@bsless
Copy link

bsless commented Feb 13, 2022

I've been following this issue for a while due to interest and need and thought I'd contribute an idea I've had given the recent comments
Instead of discriminator how about just caseOf?
caseOf will take the thing to dispatch on and a list of targets, which actually makes for a stronger system than dispatching on a single field:

{
  "caseOf": {
    "properties": [
      "p1",
      "p2"
    ],
    "cases": [
      {
        "case": [1, 2],
        "is": {"$ref": "..."}
      }
    ]
  }
}

What's cool about caseOf that it can:

  • take more than one property
  • take things other than property, for example, format
  • possibility - take schemas as cases, achieve closed polymorphic dispatch?

wdyt?

@gregsdennis
Copy link
Member

My first hesitation on caseOf is that we don't have (and have avoided adding) any keywords that define their own properties. In other words, all current keywords that have object values take a schema or a keyed collection of schemas where the keys are unspecified.

I'm not opposed to the idea, but I'd want a really good reason why such a construct would be necessary instead of constructs like we already have.

@jdesrosiers
Copy link
Member Author

@bsless Thanks for sharing your ideas. It's really appreciated. We need more people thinking about these problems.

My main concern about this approach is that there are is a lot going on. One of my main goals for this keyword is to keep it simple enough that people will use it rather than solve every possible use-case. The ability to validate this kind of thing already exists with if/then, but I think people don't reach for that tool because it's complicated and verbose. Your proposal is less complicated, avoids the error-proneness, but is just as verbose.

I think your proposal is essentially the same as the alternative I discussed in #1082 (comment). Your example would look like this.

{
  "allOf": [
    {
      "ifProperties": {
        "p1": { "const": 1 }
        "p2": { "const": 2 }
      },
      "then": { "$ref": "..." }
    }
  ]
}

I think this is just as powerful, but more ideomatic JSON Schema.

In any case, I'm afraid people won't want to reach for either of these for the most common case because it's too verbose and use oneOf instead despite (or ignorant of) its problems.

@bsless
Copy link

bsless commented Feb 17, 2022

Thank you @jdesrosiers

I see a slight difference between #1082 (comment) and my suggestion, and it relates to two elements in my rationale:

  • machine readable and simplicity, such that there would be no need to run clever inference based on the values to map to existing type systems in existing languages
  • polymorphism à la carte: Thinking of a system similar to predicate dispatch (if possible) or double dispatch (slightly weaker)

What makes caseOf different (I think, could have missed this in your suggestion) is the dispatch doesn't even have to be described in terms of properties at all, but of a schema. Why limit dispatch to "tagged objects"?

It's possible these two converge and I'm just not seeing it, this isn't a simple subject.
Thing is, I don't see how the other proposed syntaxes aren't just sugar over if/then

@jdesrosiers
Copy link
Member Author

@bsless I'm not following everything you're saying. I think it's because you're coming from a perspective that's a bit out of scope for JSON Schema ...

there would be no need to run clever inference based on the values to map to existing type systems in existing languages

Keep in mind that JSON Schema is a validation language, not a type definition language. Mapping to type systems is out of scope. But even so, I don't see how caseOf is an improvement in that area over any of the options that have been discussed in this thread. The only difference I see in your example from propertyDependencies is that values can be anything rather than just strings. I don't see any difference in any of the alternatives discussed in #1082 (comment).

What makes caseOf different (I think, could have missed this in your suggestion) is the dispatch doesn't even have to be described in terms of properties at all, but of a schema.

Maybe I don't understand your example, but I don't see how caseOf isn't limited to properties. However, if it isn't, that would just give it the same capabilities as if/then, so I'm not seeing the benefit.

Why limit dispatch to "tagged objects"?

There is no limitation. if/then will still exist for the complex cases. The point of propertyDependencies is to have a simple construct for the simple use cases. Probably upwards of 95% of all schemas I've seen that need to select between a set of schemas based on some condition are selecting based on a string value on a single property. I believe that people are reluctant to use if/then when they should because it's too verbose and/or difficult. By limiting propertyDependencies to the most common case, we can provide an alternative that that's simple enough that people might actually use it when they should.

Thing is, I don't see how the other proposed syntaxes aren't just sugar over if/then

They are just sugar over if/then. That's the point. if/then already solves the problem, but it's verbose and complicated enough that people avoid it even when it's the right choice. The point is to provide a conditional that solves a common problem with less code than if/then.

ChALkeR added a commit to ExodusMovement/schemasafe that referenced this issue Oct 6, 2022
For now, causes an uncertainty from removeAdditional / useDefaults, but
that could be resolved/optimized in some certain situations later.

Refs: json-schema-org/json-schema-spec#1082
Refs: json-schema-org/json-schema-spec#1143
@xiaoxiangmoe
Copy link

xiaoxiangmoe commented Apr 26, 2023

@jdesrosiers @gregsdennis @karenetheridge Can we use something like this:

{
  "propertyDiscriminator": {
    "foo": [
      [
        ["aaa", 2],
        {
          "$ref": "#/$defs/foo-aaa"
        }
      ],
      [
        ["bbb", false],
        {
          "$ref": "#/$defs/foo-bbb"
        }
      ]
    ]
  }
}

If obj.foo equals "aaa" or 2, then use $defs/foo-aaa, else if obj.foo equals "bbb" or false ,then use $defs/foo-bbb

Or just simplify

{
  "propertyDiscriminator": {
    "foo": [
      [
        "aaa",
        {
          "$ref": "#/$defs/foo-aaa"
        }
      ],
      [
        2333,
        {
          "$ref": "#/$defs/foo-bbb"
        }
      ]
    ]
  }
}

And this is O(1) selection.

For example:

const schema ={
  "propertyDiscriminator": {
    "foo": [
      [
        "aaa",
        {
          "$ref": "#/$defs/foo-aaa"
        }
      ],
      [
        2333,
        {
          "$ref": "#/$defs/foo-bbb"
        }
      ]
    ]
  }
}
const fooMap = new Map()
for (const [key, ref] of schema.propertyDiscriminator.foo) {
    fooMap.set(key,ref)
}
// then validate obj
const obj = { foo: 2333, bar: 1234 }
const ref = fooMap.get(obj.foo) // O(1) selection

And then we can support both boolean, string and number as discriminator. Any idea about this?


numeric support is important because some company always use numeric enum as discriminator.

@jdesrosiers
Copy link
Member Author

@xiaoxiangmoe The second version is an interesting option. One of my main design goals for this keyword has always been to keep it as simple as possible or people won't use it. This is a little more complex, but I don't think it's too much.

However, this isn't actually an O(1) process. The lookup from the Map is O(1), but the cost of building the Map is O(n). Implementations that have a compile step could produce the Map in the compile step and then evaluate instances using that Map in O(1), but most implementations don't have a compile step.

Overall, I think you've presented a reasonable alternative for this keyword. It makes it more versatile without adding too much complexity. It's technically a performance regression, but it can be optimized to O(1) in a lot of cases. In any case, we're talking about very small values for "n", so O(n) isn't much worse that O(1).

Since this issue is closed, could you please create a new issue or discussion proposing this change?

@gregsdennis
Copy link
Member

gregsdennis commented Apr 26, 2023

This also address my concern about only supporting string values. I think we could technically support any JSON value. I'd be open to exploring this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants