Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split items into items and tupleItems #864

Closed
jdesrosiers opened this issue Feb 28, 2020 · 63 comments · Fixed by #925
Closed

Split items into items and tupleItems #864

jdesrosiers opened this issue Feb 28, 2020 · 63 comments · Fixed by #925
Labels
clarification Items that need to be clarified in the specification core

Comments

@jdesrosiers
Copy link
Member

jdesrosiers commented Feb 28, 2020

The items keyword has an object form and an array form. The object form is used for constraining the items in an array while the array form is for constraining an array that is used like a tuple. This is something that trips up people new to JSON Schema.

It also causes confusion with the additionalItems keyword. additionalItems is only defined for use with the array form of items, but I often see people throwing in "additionalItems": false when they are using the object form. I'm not sure what they think that means, but it's not uncommon to see.

Proposal

  • Redefine the items keyword to only support the current object form behaviour
  • Introduce the tupleItems keyword that supports the current array form of items
  • Rename additionalItems to additionalTupleItems to make it clear that it works with the tupleItems keyword not the items keyword.
  • Rename unevaluatedItems to unevaluatedTupleItems to make it clear that it works with the tupleItems keyword not the items keyword.

Pros

  • Less confusing for JSON Schema noobs and veterans alike
  • Better errors. If you try to use an array for items when you meant the object form (and vis versa) meta-schema validation will fail making it more clear where your problem is.

Cons

  • Redefining an existing keyword is a backwards incompatible change
  • OAS is skeptical of additional changes at this stage
@handrews
Copy link
Contributor

handrews commented Feb 28, 2020

This was previously discussed in #209 specifically starting at #209 (comment)

I'm obviously not personally dead-set against it as I brought it up last time, but it did not gather support.

This would also be something where I'd want input from the OpenAPI folks (@webron @darrelmiller @MikeRalphson) because it impacts them as of OAS 3.1. Since OAS 3.0 doesn't allow tuple items to begin with, it would probably be good to put this into 2020-03 if we want it so that the change doesn't show up in OAS 3 at all. On the other hand, the OAS folks have been skeptical of additional changes at this stage.

@handrews
Copy link
Contributor

handrews commented Feb 28, 2020

Paging @darrelmiller again as I misspelled his id at first in the previous comment and I'm not sure GitHub emails people if you tag them in a comment edit

@jdesrosiers
Copy link
Member Author

I didn't realize it had come up before. I'll read through that issue.

Sounds like an ideal change for better OAS integration. That way they don't have to redefine items, they just need drop support for tupleItems.

@darrelmiller
Copy link

As someone who has never seen Items used as an array, I would not be particularly concerned about a change like this. Obviously getting this in before OAS 3.1 takes a dependency on the updated JSON Schema would be a good thing. Having said this, I'm not a heavy enough JSON Schema user to have too much weight in this conversation.

@webron
Copy link

webron commented Feb 28, 2020

While generally not a fan of creating more keywords, @handrews is correct that in this case it looks like this is not going to break anything.

I imagine that if both items and tupleItems are defined, they both need to be validated.

Not a huge fan of shoving tupleto all the new keywords as it just makes them longer, but I can't think of a better way of doing it.

@awwright
Copy link
Member

I would think if we want to remain analogous to properties/additionalProperties, we should deprecate the object form of "items". This seems more straightforward to me:

  • items: accepts an array that applies the _n_th schema to the _n_th item in the instance, if it exists
  • additionalItems: accepts an object that applies to each item in the instance except for the first n items defined by "items"

Is this any less sufficient?

@MikeRalphson
Copy link
Contributor

MikeRalphson commented Feb 28, 2020

Possibly use tuples alone in place of tupleItems in the keywords, but otherwise I'm neutral on this. (Not dropping / deprecating the object form of items, that seems like it would break OAS 3.0 to 3.1 compatibility).

@darrelmiller
Copy link

@awwright I'm not sure that the symmetry of that approach outweighs breaking 90% of the current usages of items.

@handrews
Copy link
Contributor

My preference would be to add tupleItems that behaves exactly like the array form and leave the others unchanged.

In my view, as long as we keep "Items" in tupleItems, the names of the other keywords work fine, and I prefer minimizing the thrash on those keyword names. I'm not dead set on this but that is the direction in which I would lean.

Regarding @awwright 's suggestion, it would be more symmetric but I think I'm with @darrelmiller on the fact that items with a schema is the only form of these keywords that is really widely used, and it's the only form that has long been supported in OAS. The idea here is to maybe break some obscure compatibility in JSON Schema, but not break compatibility for OAS.

@Relequestual @gregsdennis @Julian @johandorland thoughts on this?

@jdesrosiers
Copy link
Member Author

I prefer minimizing the thrash on those keyword names

The keyword thrash is unfortunate, but if we don't change it, people are going to continue to use additionalItems incorrectly.

@karenetheridge
Copy link
Member

I like the idea of splitting the keywords, but "tuple" is unintuitive. I don't want to bikeshed the wording, but what about "itemElements" for the array form?

@notEthan
Copy link
Contributor

splitting items is good. it makes more sense to me as a schema author and implementer of schema tooling to have separate keywords for the significantly different behaviors.

adding tupleItems makes sense to me, keeping items as one schema for all array items. I do see @awwright's point that array items + schema additionalItems is a closer parallel to the handling of properties/additionalProperties. but in practice, people mostly use arrays of the same sort of item, rather than tuples, and having this more common use case on the items keyword makes more intuitive sense to me, and also will break everybody's schemas less.

for additionalItems, I'm not certain the benefit of a rename; tupleItems + additionalItems + unevaluatedItems seems sufficient to me. it's a little weird that the plain items has no interaction with any of these other Items suffixed keywords, but I think it's okay.

I'm not as fond of the itemElements name suggestion - 'items' and 'elements' seem like synonyms to me. tuple is a good description of the data structure the array form expresses. as for tuples, I'd say keeping the Items suffix helps keep it in line with additionalItems (I don't think additionalTuples makes sense; not sure what else you might call items beyond tuples).

@gregsdennis
Copy link
Member

I'm with @karenetheridge in that "tuple" is unintuitive. If you don't have the history of why the array format was created, it makes no sense. I think there are use cases for this form outside of tuples, and we shouldn't pigeon-hole the keyword to a single use case.

If itemElements is unsuitable, something else should be suggested. Maybe sequencedItems.

@handrews
Copy link
Contributor

handrews commented Mar 1, 2020

Before we rathole too much on names, is anyone objecting to at least splitting the tuple (by whatever name) form of items into its own keyword? So far it seems impressively uncontroversial.

@awwright
Copy link
Member

awwright commented Mar 1, 2020

It seems reasonable to me to have separate keywords for this job.

The exact behavior, maybe less so, however.

It seems like we might want three different keywords:

(1) Schema to apply to every item of an instance array — that can exist concurrently with (2), and may include keywords factored out of each schema in (2)

(2) Array of schemas to apply to respective items in the instance array

(3) Schema to apply to items in the instance array beyond the end of (2)

Now, what do we name each of the walls of this bike shed? It sounds to me like English is missing names for "each item in a homogeneous array" and "respective items in a heterogeneous array" and instead we're just stuck with "items"

@handrews
Copy link
Contributor

handrews commented Mar 3, 2020

fixedItems? positionItems? I actually like tupleItems but I see the reasons for not using it. It's probably just that I say "tuple-form items" a lot when talking about this so obviously it makes sense to me 😛

@notEthan
Copy link
Contributor

notEthan commented Mar 3, 2020

I do like tupleItems still - tuple seems to me to perfectly describe a fixed-length array of dissimilar items. but it's not perfect. when used with additionalItems the described data structure isn't really a tuple any more (this falls under @gregsdennis's comment "I think there are use cases for this form outside of tuples, and we shouldn't pigeon-hole the keyword to a single use case").

I think positionalItems resonates best for me so far. (you suggested positionItems @handrews but the adjective feels more correct to me.)

@Relequestual
Copy link
Member

Affirmative on this change.
I'd say we go for tupleItems, additionalTupleItems, and unevaluatedTupleItems.

I may be wrong but I didn't see anyone suggest that, and it seems like the logical naming method to me.

But, I'm open to positionalItemsassuming additional/unevaluated... positionalItems also.

@handrews
Copy link
Contributor

handrews commented Mar 3, 2020

Why change additionalItems and unevauatedItems? They're nicely symmetrical with additionalProperties. They're also already really long keywords.

@jdesrosiers
Copy link
Member Author

@Relequestual

I may be wrong but I didn't see anyone suggest that

That was the original proposal, so I agree completely 😉

I recognize that not everyone works in languages that have a concept of a tuple and therefore find it less intuitive to think in those terms. It will be a lot more intuitive to an Elixir developer than it would be to a Java developer. However, I think it's the best description of the data structure. "Positional" is not bad, but I'd rather not invent a new term for something that already has a well defined term.

@handrews
Copy link
Contributor

handrews commented Mar 3, 2020

@notEthan I almost wrote positionalItems instead.

Also, I think positionalItems + additionalItems == items works nicely.

I really don't want to make additionalItems and unevaluatedItems longer than they are now. I'll prefer any name for the tuple/positional form of items that makes people more comfortable leaving those names unchanged.

@jdesrosiers
Copy link
Member Author

@handrews

Why change additionalItems and unevauatedItems?

additionalItems only applies to the array form of items. If we split out the the array form to tupleItems, then additionalItems only applies to tupleItems, not items. additionalItems sounds like it applies to items. I often see people use additionalItems with the object form of items. If we don't change additionalItems to match the keyword it works with, it makes it even more confusing.

I definitely agree that it makes those name uncomfortably long, but I think a better name is worth it for our users.

@handrews
Copy link
Contributor

handrews commented Mar 3, 2020

OK, let's sort this out. There are several conflicting goals here:

  1. Stop having a keyword with two forms
  2. Minimize change for existing users
  3. Maximize analogy with the object keywords
  4. Avoid the ability to write the same thing with two keywords
  5. Minimize confusion for new users

In all options, unevaluatedItems (or its renamed equivalent) is simply the sees-through-in-place-applicators version of additionalItems (or its renamed equivalent), so I'm not going to mention it further.

Currently we have the following (which is not very analogous to the object keywords):

  • items (has two forms, bad)
  • additionalItems (ignored if items not an array, to avoid same thing with two keywords)

The possible keyword behaviors are (with the current keywords listed where relevant)

  • apply to all children, unconditionally
    • object: n/a
    • array: items (single schema form)
  • apply to specific children
    • object: properties (by name)
    • array: items (array form, positional prefix only, can skip positions with true schema)
  • apply to groups of children (loosely similar, as arrays and objects are very different)
    • object: patternProperties
    • array: contains, minContains, maxContains (no control over position)
  • apply to remaining children not covered by other keywords
    • object: additionalProperties, unevaluatedProperties
    • array: additionalItems, unevaluatedItems

I included the groups of children keywords here for completeness, but we won't discuss them further here. Nor will we discuss unevaluated* which will always be the dynamic sees-through-in-place-applicators version of additional*, under whatever names we end up with.


So let's look at options.

To be perfectly analogous to the object keywords, we'd need to both split the current items and either add a unconditional all-properties keyword (so objects match arrays), or remove the unconditional all-items keyword (so arrays match objects).

Let's look at removing the unconditional all-items keyword, as it produces a minimal set of keywords with no duplicate behaviors:

  • apply to specific children
    • properties
    • items (array form only, no single schema form)
  • apply to remaining children
    • additionalProperties
    • additionalItems

This handles goals 1, 3, 4, and arguably 5. If I were starting JSON Schema over today, this is what I would do. But, we're not starting JSON Schema over, and this violates goal 2 (minimize change for existing users). The change for existing users would impact the vast majority of array schemas, and require OAS to go to OAS 4 rather than OAS 3.1 for compatibility reasons.

We could come closer to meeting goal 2 at the expense of goal 5 (minimize confusion for new users) by renaming additionalItems to items:

  • apply to specific children
    • properties
    • prefixItems (or tupleItems or positionalItems or whatever)
  • apply to remaining children
    • additionalProperties
    • items

This retains enough compatibility for OAS 3.1, as they do not use additionalItems or the tuple form of items at all. But the asymmetry with objects is pretty glaring.

We could try to rationalize the whole thing into a minimal set of consistent, reasonably named keywords:

  • apply to specific children
    • namedProperties
    • positionalItems
  • apply to remaining children
    • properties
    • items

but that would break what is probably the most commonly used applicator keyword in all of JSON Schema, properties, in a horribly confusing way for existing users. And require OAS 4. Or possibly OAS 400. Is there a Semantic Versioning guideline for "we reversed your keywords"?

We could give up on goal 4 (no apparent duplicate keywords- note that using additionalProperties on its own doesn't actually do anything so in practice they are not really duplicates):

  • apply to all children, unconditionally
    • object: n/a
    • array: items
  • apply to specific children
    • object: properties
    • array: prefixItems
  • apply to remaining children not covered by other keywords
    • object: additionalProperties
    • array: additionalItems

This is the absolute minimal impact to existing users (goal 2), and is reasonably decent for new users (goal 5), except that violating goal 4 does leave some confusion around for new users because the apparent duplication is weird.


Nothing will be ideal. NOTHING. We are a project with a long history and a large user base, and that limits what we can do. The fact that we're still nominally a "draft" ignores the reality of the deployment of JSON Schema-based solutions. Pissing off our user base throws away the most valuable asset of the project: the concrete evidence that it is actually useful, as demonstrated by people using it.

My preference would be:

  • apply to specific children
    • properties
    • prefixItems
  • apply to remaining children
    • additionalProperties
    • items

In this approach:

  • object keywords are unchanged
  • items with a single schema is unchanged
  • prefixItems implies "before the items", so items applying only to those items after it is at least somewhat intuitive
  • unevaluatedItems is still more or less OK as a name, because while the parallels with additional* were nice, the primary meaning is that it applies to items that have not been evaluated by any of the other keywords

This definitely solves goals 1 and 4, and strikes a reasonable balance between 2 and 5. It mostly gives up on goal 3 (analogy with objects), although it does improve that situation in one way as there is no longer an "extra" array keyword compared to the object keywords. It makes it worse in another way, by breaking the additionalProperties/additionalItems symmetry.

In an ideal world, we'd solve all five goals, but as noted earlier, we are not redesigning JSON Schema from the ground up. We have legacy commitments, and I think the right thing to do is to take the legacy situation into account, work on our documentation and education to mitigate it, and solve all of the other problems as best we can.

Note that in this proposal, the name prefixItems is critically important. Neither tupleItems nor positionalItems provides the necessary implications that items will apply to the items beyond the last position. Only prefixItems does that, which is the only way this becomes reasonably intuitive.

@awwright
Copy link
Member

awwright commented Mar 4, 2020

  • Stop having a keyword with two forms
  • Minimize change for existing users
  • Maximize analogy with the object keywords
  • Avoid the ability to write the same thing with two keywords
  • Minimize confusion for new users

Agree on every front.

Another idea: we add "itemValues" and "propertyValues" — these cover the values of every item, whether they're also specified in "items" and "properties".

Then, we make "items" indexed-only (array form only), the same way "properties" is object-only (except as necessary for reverse compatibility—most implementations will want to preserve reverse compatibility, of course). Alternatively, we say something like "As an authoring convenience and for historical reasons, if neither additionalItems nor itemValues is being used, items may be used with an object as a shorthand for itemValues".

Optionally, we permit "additionalItems" to work even when "items" is absent—the same way "additionalProperties" works without "properties".

And note, this is symmetrical with "propertyNames" — this covers every property's value and items' value, the same way "propertyNames" covers every property's name.

And it makes it symmetrical with properties/additionalProperties.

afaict, this is a win all around.

@Relequestual
Copy link
Member

I feel @handrews suggestion of prefixItems makes the most sense.

@awwright are you suggestion we have prefixItems, items, and additionalItems?

As far as I can work out, @handrews proposal with prefixItems includes removing additionalItems, given that items with an array value is now prefixItems and additionalItems now effectivly becomes items but limited to an object value.

Am I reading this right?

@awwright
Copy link
Member

awwright commented Mar 8, 2020

Ok, so to summarize the two different behaviors we're considering:

My proposed solution is to split the keywords into three (what I understand the original issue to be calling for), so we have one keyword for the object form of "items", and two keywords for tuples: the array form of "items" and "additionalItems". i.e. the 3-keyword solution.

It seems the other solution is the 2-keyword solution to re-combine the behaviors of the existing two keywords, so that additionalItems always works (named as "items" which is how "items" works right now anyways), that optionally may be "prefixed" by an array of schemas. (This seems more in line with how "properties"/"additionalProperties" works now.)

Am I summarizing this right?

@handrews
Copy link
Contributor

handrews commented Mar 8, 2020

@awwright yes, that is essentially correct. The 3 vs 2 aspect did appear in the original comment of this issue in the form of:

but I often see people throwing in "additionalItems": false when they are using the object form. I'm not sure what they think that means, but it's not uncommon to see.

observing that the overlapping nature of object-items and additionalItems is confusing. I don't recall if it's mentioned in this issue, but somewhere it's also come up more than once that people try to just use additionalItems when they mean object-form items, which is why there's that language in the spec that additionalItems is outright ignored if items is not present, or if it is present and an object. It's also been noted that implementations don't all get that right.

It's objectively unnecessary to have three keywords for this, the only question is whether it's worth preserving 3 vs 2 because historically items behaved like two different keywords, so we've kind of implicitly had 3.

@jdesrosiers
Copy link
Member Author

jdesrosiers commented Mar 9, 2020

@handrews

@jdesrosiers has not been able to get consensus that tupleItems is broadly preferable

True, but why single out my proposal? This is clearly true of ALL alternate proposals including yours.

, and there is an actual rationale behind prefixItems

This is insulting and disingenuous. All of the proposals given have a well considered rationale. Just because you like yours better doesn't mean the others are not equally well thought out. It's disrespectful an frankly bullying behavior to suggest otherwise.

that helps intuition more than the other names

That's your opinion. You've had multiple people comment that prefixItems isn't as intuitive as you think it is. I can only hope that you take those comments as seriously as similar comments about other proposals.

, so I'm still sticking with it.

That's fine. As per usual, I'm offering my opinions as: take it, leave it, modify it, it's up to you.
(Edit: I removed a comment I feel was out of line. Sorry)

it's also come up more than once that people try to just use additionalItems when they mean object-form items

I've been following the jsonshema tag on StackOverflow for the last five years and have been the top answerer for years, so I like to think I have a pretty good understanding of what noobs are getting wrong. I've never seen anyone make this mistake. If I've never encountered this, I don't think this is a problem that needs solving. I honestly don't see why it's a problem anyway. Lots of keywords in JSON Schema have overlapping functionality including enum/const, patternProperties/additionalProperties, and many other examples.

The problem I do see frequently with noobs is using additionalProperties with object-form-items. Even the spec team gets tripped up by this one sometimes. That's the problem needs solving.

@gregsdennis
Copy link
Member

gregsdennis commented Mar 9, 2020

Lots of keywords in JSON Schema have overlapping functionality

The overlapping functionality isn't a concern so much as having two forms of items that function differently.

Personally, I prefer sequenceItems.

"Sequence" suggests the functionality of the keyword better and in a more general way.

"Tuple" refers specifically to the original use case of the array form, but there are other use cases.

While "prefix" suggests "these come before the rest," it doesn't convey the idea that they are ordered in any way.

@jdesrosiers
Copy link
Member Author

The overlapping functionality isn't a concern so much as having two forms of items that function differently.

Agreed. I often see people confused because they chose the wrong form. I forgot the mention that problem when I wrote up the issue.

I won't argue for "tuple" anymore. Clearly no one likes that term. I can live with "sequence" or "positional". But, I'm still a fan of having separate all-items and additional-something-items keywords. There's some overlap in functionality, but I'd rather have that focused keyword that I know won't be influenced by another keyword. I could get what I want by wrapping the additionalItems (of whatever name) in an allOf, but I shouldn't have to jump through such hoops to get that safety.

@handrews
Copy link
Contributor

handrews commented Mar 9, 2020

@jdesrosiers that really was not meant in as hostile of a way as you took it, I'm sorry that I didn't calibrate it better.

Why single yours out? Because you keep advocating for it while no one else seems to have particularly strong opinions. I did not mean to imply that you had no real thought behind it or anything of that sort. A better way would have been for me to point out that prefixItems is intended to provide an intuition for the role it and items play with respect to each other, while all of the other proposals are about what kind of thing is described (is it a tuple, or a sequence, or whatever).

I think it is important to convey usage rather than data types, especially because there is not a consensus on the data type that is most desirable (neither you nor @gregsdennis seem likely to budge on this). positionalItems is a little more functional, but does not help explain the relationship to items, really. prefixItems does.

@gregsdennis there are no words that will perfectly express everything. The fact that the value is a list and that the keyword after all will be documented should cover the ordered-ness of it.

At this point it looks like (based on comments and on offlist conversations), here is the current support. Note that this is not intended to imply a vote or that there will be a majority-rule vote.
OpenAPI's opinion carries additional weight, for one thing. But this is just to summarize what seems to be the two viable options:

I know that @Relequestual and @webron agree with my rationale on the 2-keyword proposal. I still feel like the 2-keyword version is the least ambiguous, and am bothered that the 3-keyword proposal leaves the confusing items vs additionalItems problem in place, but let's give everyone one more chance to weigh in. I will attempt to get @Relequestual and @webron to actually comment again instead of just talking to me on slack. AHEM.

@jdesrosiers
Copy link
Member Author

@handrews Thanks for your clarifications. This feels like a more productive discussion already. I'll write up a final summary of my thoughts so they are all in one place then I'll leave it up to the spec team to make a decision.

OpenAPI's opinion carries additional weight

I totally get why that is, but in this case I would expect the opposite because this change is entirely limited to functionality that OpenAPI explicitly doesn't support. It doesn't effect them at all.

@handrews
Copy link
Contributor

handrews commented Mar 10, 2020

@jdesrosiers

in this case I would expect the opposite because this change is entirely limited to functionality that OpenAPI explicitly doesn't support. It doesn't effect them at all.

Of course it does- JSON Schema questions often surface there, and the question of "is there a confusing kinda-sorta-overlap between items and additionalItems or is there just one keyword covering that entire set of functionality" will be something they have to deal with.

To me, it is unquestionable that an ideally named two-keyword solution is superior to an ideally named three-keyword solution. I'm not sure which is worse: when one keyword actually makes another superfluous (the case if we don't carve a weird exception out for additionaltems, which was how things were at some point in the past) or when it appears to make it superfluous but in fact doesn't work at all (which is how things are right now). Clearly, avoiding the overlap altogether is better.

However, no set of keyword names is perfect. We must choose which imperfection we can best work with, and OpenAPI spec and tooling people (as well as JSON Schema implementors outside of the OpenAPI ecosystem) will be impacted by our choice.

@karenetheridge
Copy link
Member

karenetheridge commented Mar 10, 2020

The use of the word "sequence" in the array form of items is good. Grammatically, sequentialItems is better there.

As for additionalItems being erroneously (or at least nonsensically) combined with items (the object form), why not simply make it illegal? It is possible, with some combination of not and if/then/else, to outright forbid using items together with additionalItems. At the very least, there are other combinations of schema constructs which make no sense, so either they should all be illegal, or none of them are and their combined use is simply ignored.

(Or perhaps that would be better relegated to a linter tool, and/or a "strict" variant of the schema specification, e.g. where additionalProperties is false.)

@handrews
Copy link
Contributor

@karenetheridge we avoid making keyword combinations illegal. They may be nonsensical, but you can write them without causing errors. We do not want to mandate that all schemas be validated against their meta-schema, nor do we want to require implementations to maintain a list of error conditions to check manually. Also, there are endless possible nonsensical combinations, and trying to enforce them all in the meta-schema would very quickly get out of hand.

@Relequestual
Copy link
Member

Relequestual commented Mar 10, 2020

I WOULD lock this issue, but given ya'll can bypass that anyway, there's not much point in that!

I'd like to take a day or two to digest and chat with a few people and then I'll call before end of week.

It seems like this would be a simple thing to solve, but I think the bikeshedding has caught us again.

Please avoid commenting further till further notice! I guess it's not unreasonable for proposers to post any final remarks =]
Come find me on slack if this is problematic.

@jdesrosiers
Copy link
Member Author

My Final Remarks

I won't argue over intuitiveness or naming because these are entirely subjective and opinion-based. There are two issues we see regularly with items and additionalItems.

  1. items has an object form and an array form and people get confused about which to use.
  2. People are using additionalItems with object-form-items which has no meaning.

Both the 2-keyword and 3-keyword proposals solve both of these problems. However, the 3-keyword proposal MUST rename additionalItems and unevaluatedItems to additionalWhateverItems and unevaluatedWhateverItems for it to solve the second problem. If we don't it's even more confusing because additionalItems no longer applies to items at all, it applies to whateverItems.

It was also brought up that it's confusing that additionalItems with no sibling array-form-items keyword has the same validation effect as object-form-items. The 2-keyword solution solves this by getting rid of object-form-items, but I think this solves the wrong problem. additionalItems is confusing because it's behavior changes when items is present, not because it has similar behavior to items. We have lots of keywords with overlapping functionality and we don't have a problem with those. I don't think it's a problem here either.

I have long been against keywords that have side effects. I've even gone as far as to propose that additionalProperties and additionalItems be removed from the spec. There are several issues with keywords that have side effects, but the main one is that it makes schemas more bug prone. For those reasons, I personally don't use keywords that have side effects and encourage others to use alternatives as well. If the 2-keyword solution is adopted, the only option I have for arrays (items, previously additionalItems) is a keyword that has side effects. There is no alternative I can use that is safe from being broken by another keyword added somewhere else in the schema. I'd rather have no change at all than lose access to a safe keyword for arrays.

@handrews
Copy link
Contributor

@jdesrosiers by

have side effects

Do you mean how their behavior depends on other keywords?

@Relequestual
Copy link
Member

@jdesrosiers
If that (re side effects) is the case, then this is what we already have, but extra confusing because it depends on if items is an array or object.

With the 3 keyword proposal, say you called it positional.
You would have positionalItems, items (object form only), and additionalPositionalItems.

Now, I could STILL read that as prefix items, middle items, postfix items, assuming that positional is is start and additionalPositional is tail end, while items as an object form still applies to all other items in the middle.

One could argue that this is even something someone might want to do (prefix / postfix items). I can come up with a rationalle for it!

But the point here is, we want to simplify to avoid possible missunderstanding.

If items ONLY takes an object, and prefixItems ONLY takes an array, it becomes clearer what their intent are. If we needed a postfixItems, that could be later, if someone outside our circle pops up with a use case (real world rather than academic here).

I'm less concerned about "side effects" and more about ease of use and intuitiveness.

Can you explain how the 2 keyword proposal would have side effects and how that's worse than what we currently have please? I feel like I don't understand your position on this fully, and I'd like to feel like I do.

@handrews
Copy link
Contributor

@Relequestual I like that it leaves open the possibility of a postfixItems, which I hadn't even thought of. I'm pretty sure I once wanted to be able to describe the last N elements of a list for... something :P Not enough to want to add it now. I agree on waiting for someone with a real use case to ask. But the fact that there's a reasonable naming scheme for it is very appealing. Room to grow.

@jdesrosiers
Copy link
Member Author

by, "have side effects", do you mean how their behavior depends on other keywords?

Yes. For example, if I add properties to a schema with addtionalProperties, adding properties has the side effect of changing what additionalProperties validates. A "safe" keyword would not be effected by other keywords. Schemas are easier to reason about if keywords behave in isolation.

Can you explain how the 2 keyword proposal would have side effects ...

Assuming the keywords we have now, is it clear that additionalItems is unsafe (vulnerable to side effects)? That changing an array-form-items keyword can have the side effect of changing the validation result of additionalItems? Or are you asking me to go into why this is undesirable?

To be clear, both proposals have an unsafe (vulnerable to side effects) keyword. The difference is that with the 2-keyword proposal, I have no alternative to avoid the unsafe keyword when describing a typical array with uniform items.

The 2-keyword proposal renames additionalItems (unsafe) to items and removes object-form-items (safe). So, in order to describe an array with uniform items, I have no choice but to use an unsafe keyword. The safe option no longer exists. I recognize that unsafe keywords are the the best option for some things, but they should be avoided when they are not necessary. The 2-keyword proposal eliminates the possibility of avoiding unsafe keywords for normal arrays.

... and how that's worse than what we currently have please?

To me, the 2-keyword solution forcing the use of unsafe keywords is creating a bigger problem than the usability problems it solves, but that's a matter opinion and it's not even irrelevant because it's not a zero-sum choice. The 3-keyword solution fixes the usability problems without eliminating safe keyword alternatives.

Hopefully, that helps.

@awwright
Copy link
Member

I don't think "side effects" are bad per se. There's some that are/were needless (like boolean exclusiveMinimum/exclusiveMaximum, remember that?). Like a lot of things, we have them because they're an authoring convenience, because it would be more difficult to write schemas without them.

I prefer to think of them as arguments to other keywords, instead of keywords outright, as long as we have a clear pattern to distinguish them. e.g. "additional*" — it should be quickly obvious the behavior is "in addition to" another keyword.

By authoring convenience I mean: imagine how you'd write this if "additionalProperties" wasn't defined in terms of "properties":

{
   type: "object",
   properties: {
      "_id": { type: "integer" }
   },
   additionalProperties: { type: "string" }
}

If we wanted to stick to the pattern where keywords were always independent/safe, we might have to have a form like

{
   type: "object",
   properties: [
      {
         "_id": { type: "integer" }
      },
      { type: "string" }
   ]
}

This simply seems unwieldy. I think we just make the trade off, in 2 or 3 cases we have keywords that are defined in terms of other keywords, and they're handled as special cases.

If I were designing the vocabulary from scratch, I might devise a special naming scheme like

{
   type: "object",
   "properties": {
      "_id": { type: "integer" }
   },
   "properties.additional": { type: "string" }
}

But how many of these cases are we really going to have?

As long as we're resigned to having "argument keywords", I think this is an argument for the 3-keyword solution: e.g. sequence, additionalSequence (the naming has to make it clear it's an argument), and items to preserve reverse compatibility.

I can be persuaded of the 2-keyword solution (@handrews made a good argument to me earlier) but you're going to sell me on the names of the keywords, and I can't think of an intuitive name that fits our pattern.

@handrews
Copy link
Contributor

The primary naming constraint is that we're not changing items in such a way that existing OpenAPI uses (which match the vast majority of non-OpenAPI uses: a single schema form items and no additionalItems at all) would have to change. That's the one absolute constraint.

Otherwise, I actually think additionalProperties is awful because almost no one seems to understand that it can be used on its own. Which is why I like items as the one indefinite-matching keyword, with clearly named modifier keywords. It's the use of the modifier keywords (prefixItems) that control behavior, which is why I want to name the modifier keyword after its modifying behavior.

@Relequestual
Copy link
Member

additionalItems is confusing because it's behavior changes when items is present, not because it has similar behavior to items.
@jdesrosiers

My feeling is slightly different on this. I think it's confusing because it's expected to work when items is used in the object form, but it does not. This is my experience of reading StackOverflow questions, and helping people on the slack server. I cannot say I've seen confusion when items and additionalItems have been used when items is an array.

I feel the simplest solution is to go with the 2 keyword solution.

2 keywords maintains the 99% use case of items while enabling the occasional use case of validating the first N items differently.

I feel like I understand the argument for avoiding side effects. Splitting the keywords up into side effect and non side effect groups, while it allows you to potentially avoid problems, I'm not sure the majority of users will run into those problems in the first place, AND will avoid confusion over confusing users when to use one keyword vs the other.

The primary naming constraint is that we're not changing items in such a way that existing OpenAPI uses...
@handrews

Both proposed solutions achive this (iirc), but also we have to consider the wider implications for tooling. KISS. Keep It Simple Stupid.

I'm still convinced the 2 keyword solution is the simplest, and the most intuative for users.

It's not expected that most users will ever need to use prefixItems... I still don't recall seeing more than 3 maybe examples of needing additionalItems in the wild.

Greenlighting this issue for a PR.

@gregsdennis
Copy link
Member

gregsdennis commented Mar 18, 2020

Changing my vote to 2 keyword solution, but I still prefer sequenceItems/sequentialItems over prefixItems. While "prefix" can convey that it should be evaluated first (I'm not sure it does), I think that that quality is better left for the specification, and the keyword should describe the behavior, which "sequence" achieves.

@awwright
Copy link
Member

I'd be happy with the @gregsdennis proposal.

@Relequestual
Copy link
Member

The literal definition of prefix as a verb is "applied before". What does it mean to you that I'm not seeing?

I'm going to chat with others in this thread one on one in effort to come back with a resolution that all find acceptable rather than more discussion and bikshedding =]

@json-schema-org json-schema-org locked and limited conversation to collaborators Mar 18, 2020
@Relequestual
Copy link
Member

After discussions, I haven't been convinced that we should do anything different to the comment where I greenlit the issue for PR.

Awaiting PR.

@gregsdennis gregsdennis added clarification Items that need to be clarified in the specification and removed Type: Maintenance labels Jul 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
clarification Items that need to be clarified in the specification core
Projects
None yet
10 participants