Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should $ref be outside json-schema spec and just be a simple generic referencing mechanism? #279

Closed
ruifortes opened this issue Mar 22, 2017 · 29 comments

Comments

@ruifortes
Copy link

I know the specs state that properties other than $ref should be ignored but I disagree.
One of the most usual situations having a prop that is referencing some general schema type that needs a more specific tittle and description but doesn't make sense to extend that schema just to add metadata.

heres an example props from OCDS (open contrating data standards):

buyer: {
  title: 'Buyer',
  description: 'The buyer is the entity whose budget will be used to purchase the goods. This may be different from the procuring agency who may be specified in the tender data.',
  $ref: '#/definitions/Organization'
},

How do the specs suggest we accomplish this? The above could just be syntactic sugar for:

buyer: {
  $merge: {
    input: {
      title: 'Buyer',
      description: 'The buyer is the entity whose budget will be used to purchase the goods. This may be different from the procuring agency who may be specified in the tender data.',      
    },
    source: {$ref: '#/definitions/Organization'}
  }
},

Why not just worry about dereferenced json in json-schema?

@Relequestual
Copy link
Member

There are a number of other related issues where a merge or patch or some sort of merging is suggested. They have had some discussion. As such I'm closing this issue in favour of the others that already exist, as I consider it a duplicate.

(#15)
(I would subscribe to that issue, as we will probably need to close it and open a new one with clearer direction and summise the proposal, objectsions, and possible problems)

@handrews
Copy link
Contributor

#98 is the merging solution that has the most support (#15 is the most general form but there is a lot of opposition- I was originally in favor but have since become firmly opposed)

@ruifortes
Copy link
Author

This issue is not about merge and "schema extensions" has the issues you referred end up discussing.

I just think the whole #use is case over annotation override is over complicated and harder to read.
This annotation overriding can happen with extreme frequence and it seams a lot of people are already using it like this.
I agree with epoberezkin on that issue.

Are the "no additional props" in $ref that sacred?
Can't we assume some shorthand notation regarding this?

Also I agree that $merge should be outside the scope of json-schema spec.
The fact that $merge treats a null value as a delete could also be beneficial.

I'm having a hard time understanding that "allOf is nonsensical when combining multiple schemas using default"
The default keyword has a very loose definitions so it's up to validator and UI implementations to deal with it. Is it a case of precedence? Maybe choose the default from last item in the array if if there are properties with same name.
Couldn't default be used in a schema or a property referencing a schema to override the default values?
The parent defaults would override children ones.
Also the addictionalProperties: false could be overrided in $ref.

{
  descriptions: 'some complex prop validation or even some local schema'
  default: {
    someSchema1Prop: 'default value'
  },
  allOf: [
    {$ref: 'schema1.json', addictionalProperties: true}, overrided
    {$ref: 'schema2.json', addictionalProperties: null}, or deleted
  ]
}

A also disagree with "$merge'd document should be considered a new document, and get its own URI."
That should be the $id competence that is part of the json-schema spec.

Regarding $ref resolution and $id I think it's reasonable that a generic json $ref to have some king of mechanism to change base url based on some property value. In this case $id.

Also I believe that $ref should be only about "inclusion" and really don't understand the problem with circular references and why $ref in this comment won't resolve.
$ref's just have to be recursively dereferenced and if you want to skip local #ref's you have to be sure to only use local ref's where parent defines base url (a schema with $id)

The only problem with circular references is when serializing. Otherwise validators should work on fully dereferenced objects where all reference are instanced.

I think $ref and $merge should be out of scope of json-schema and validation which simplifies things a lot.

Again sorry if I'm missing something obvious but sometimes I feel like the spec is getting a little heavy

@handrews
Copy link
Contributor

@ruifortes you're covering a huge span of topics here- I will try to pick this apart and answer the main concerns, but smaller, focused issues will make for easier discussion.

  • Forbidding other keywords with "$ref" is not "sacred", but changing it needs to be attached to a use case, and not done just because. It is likely either a syntactic shortcut for "$use" or "$merge", and the important thing is to determine which, if any, we are going to adopt. Then we can decide on the initially proposed syntax vs inlining more keywords with "$ref". But right now we don't have agreement on what the goal of allowing such a thing would be.
  • additionalProperties overriding is not a simple as you think when you have to deal with for all situations of combining schemas to arbitrary depth. It's been discussed ad nauseam and I'm not repeating it here. I can dig up a discussion for you later if you really need one.
  • "allOf" and the other combinatoric keywords are analogous to n-ary boolean operators, so their operands are not ordered (even though JSON arrays are ordered). So there is no concept of a "last" schema which would override the "earlier" schemas.
  • The behavior of "default" is intentionally loosely specified, but the mechanism for determining which default applies is not. Figuring out which annotation keywords to use should be unambiguous.

Your other comments seem to be mostly just comments, so I'll stop here. The "$ref" that doesn't resolve should be explained somewhere in that thread, I thought.

@epoberezkin
Copy link
Member

epoberezkin commented Mar 22, 2017

really don't understand the problem with circular references

@ruifortes You mean you don't understand why inclusion won't work or why recursion is needed?

why $ref in this comment won't resolve

Because it needs the base URI for it's resolution and it is outside of "included" fragment.

After our "ad-nauseum" discussions :) I think we've all agreed two things (@handrews, correct me if I am wrong):

  1. $ref can be seen as inclusion with two caveats:
  • it's validation-time inclusion rather than pre-processing-time inclusion, so that recursion happens based on the actual data and finishes at some point (to address problem 1 above).
  • It takes base URI from the JSON document where included fragment exists (to address problem 2 above)
  1. delegation seems a more sane point of view to $refs than the inclusion with the above caveats, because they are counter-intuitive (inclusion implies pre-processing and lack of source context-awareness).

@ruifortes
Copy link
Author

ruifortes commented Mar 25, 2017

Well...I might have mixed up topics from my last post #278 and topics the posts you suggested end up discussion. I'll try to be more precise.

Generally I think referencing, merging of patching should be outside json-schema scope.
Json-schema should regards its schemas as being fully dereferenced and merged.
Could you please identify where this wouldn't be possible?
"Preserving independent validation" or checking for "recursive cases when inclusion is impossible" should done by a linter (on the original code containing references of course) that would check all this conventions.
Linting would be done in the original code containing references of course. Analizing a fully dereferenced (and merged) json could not know if a $ref overrides more than metadata or a extending a schema with $merge completely changed original schemas.

Regarding $use vs metadata props in $ref I would just prefer the convention (guaranteed by the linter) that $ref is a syntactic shortcut for $merge and that it would just merge metadata.

I think it's better to accept a greater degree of conventions instead of trying to enforce strict conformance adding complexity to the language.

This way json-schema stays much more succinct and readable.

Regarding "allOf", "additionalProperties" and "default" I'm still a little confused and reticent about the topic.
I'm reticent if the strictness of "allOf" for extending schemas is a good thing.
Wantig the strictness of "allOf" and then not taking the fact that no aditional properties are allowed in that schema is a little strange.
Also I think "additionalProprties" (has already proposed I think) should be a validator option.
Maybe just reserve the "allOf" for the real strick cases and leave extending for $merge and linter.

Regarding the order of "allOf". I don't understand the relevance when dealing with the "defaults" issue. "allOf" is an array and you can use that to define "default" precedence.
If cases where it is not possible (want different prop order the parent prop "default" could be used (even for nested schemas) or set that default to null in the operator $ref.

@epoberezkin I don't understand why inclusion won't work and I understand that recursion and circular references are necessary I just don't see the problem with them besides serialization.
{"$ref": "schema2#/definitions/foo"} will just dereference to { "type": "string" } (if the dereferencer uses recursion has it should)

I don't understand the cavets with recursion (I guess you mean circular references) has I believe validations should be done in fully dereferenced schema. This fully dereferenced schema to validate against is js (or any ather language that surelly supports instancing) not json so circular references are not a problem.

The only problem would be if you dereference only external references (sometimes is useful for some validator that only reads local references and to have more readable json).
In that case if the external resource has a local reference (like the one in your case) it "must" also have an $id to be used as baseUri by the dereferencer.

Can you explain the problem with fully derefing the schemas?

@epoberezkin
Copy link
Member

JSON schema is language independent. You cannot define validation process relying on self referencing data structures existing only in some languages. So if you cannot de-reference schemas staying in JSON territory then spec cannot do it at all.

@ruifortes
Copy link
Author

ruifortes commented Mar 25, 2017

You can not create a validator "in JSON".
Nevertheless you "can" use $ref in the schema. You just have to deref it first in the language (that is not json) in wich the validator is built. Sorry...but I just can't understand the problem :-/
Are there programing languages without object references?
Nevertheless that's an implementation problem. That doesn't mean that $refs should be a json-schema concern. It should just assume they work. Even circular ones.
And why should you be able to dereference staying in json??

@ruifortes
Copy link
Author

ruifortes commented Mar 25, 2017

Although that's a situation I haven't contemplated when using "externalOnly" in my dereferencer.
External references with fragment should like { $ref: 'schema2#/definitions/foo' } should deref target (foo) completly including local refs.

@epoberezkin
Copy link
Member

epoberezkin commented Mar 25, 2017

You can not create a validator "in JSON".

Correct, but you can define a validation process without referring to any language - that's what the spec should do (and it does)

Are there programing languages without object references?

Yes

Nevertheless that's an implementation problem.

Exactly. So it's ok to fully dereference using recursive data-structure in the implementation, but it is not ok to use such terminology in the spec.

That doesn't mean that $refs should be a json-schema concern. It should just assume they work. Even circular ones.

It does mean exactly that.

And why should you be able to dereference staying in json??

Because it is JSON schema, it should be explained in terms of JSON and other standards, not in terms of any programming languages

@handrews
Copy link
Contributor

@ruifortes how do you "fully dereference" a recursive / circular reference schema?

Here is a simple recursive schema which expresses that a valid instance must be an object, in which all properties are themselves objects (to any level of nested objects)

{
    "type": "object",
    "additionalProperties": {
        "$ref": "#"
    }
}

@epoberezkin
Copy link
Member

how do you "fully dereference" a recursive / circular reference schema?

@handrews the idea of @ruifortes is to use self-referencing recursive data structures that exist in JavaScript and some other languages. I just commented on that above.

@handrews
Copy link
Contributor

@epoberezkin thanks for the clarification, I admit I'm having a bit of trouble following this thread.

@ruifortes having re-read this keeping @epoberezkin's comment in mind, yes, he has the right position here: the spec cannot make any assumptions about the language of implementation whatsoever. Not only are there endless approaches to programming languages, but various orthogonal constraints (performance, streaming media processing, constrained environments, etc.) may limit what language features can be used.

@ruifortes
Copy link
Author

But circular refs is a Json problem that is not specific to json-schema.
If the "json" of the "json-schema" has circular references of course it has to use "$ref" and can't be turned in to a fully dereferenced "json-schema" has it can't to a fully dereferenced "json".
So it's a "json" issue not "json-schema"

@epoberezkin
Copy link
Member

@ruifortes it is indeed possible to construct a recursive object resolving all the references in JavaScript. If you do so, make sure to add "$id" keyword in the included fragments based on the source from which this fragment is taken (or fully resolve $id before inclusion, if it is already present), as otherwise base URI may be incorrect. I find this approach overcomplicated - in my implementation I simply compile schemas to functions and $refs are seen as calls to other functions.

That is not a possible approach to define such process in the spec as there is no appropriate standard for recursive data structures (and XML is not an appropriate standard to define JSON schema).

So the only sane approach is to see $ref as JSON schema concern and as delegation rather than as inclusion. That's the change that was made from draft-04 to draft-06.

@ruifortes
Copy link
Author

ruifortes commented Mar 25, 2017

Well...I'm implementing it now in my dereferencer. I don't think it is complicated at all since "externalOnly" is only useful for optimizing "serialization" (witch is not even an issue with gziped messages) and for readability reasons for cases where a type is extensively used in a schema and it's much more readable if props just $ref it to "#definitions/usedalotschema".
I'll try to wrap up my derefer and post some cases online.
Meanwhile would be great if you could show me some edge case situations
Sorry epoberezkin but I still think your approach dealing with $ref makes things more complicated but...I might be wrong. I'm still kind of new to this json-schema thing

@ruifortes
Copy link
Author

ruifortes commented Mar 25, 2017

@handrews I don't have to dereference it to json.
In js it's just

var schema = {type: 'object'}
schema.additionalProperties = schema

of course, the derefer does this automatically

@handrews
Copy link
Contributor

@ruifortes you seem to be talking entirely in terms of your implementation, but this repository is about the specification. It is irrelevant to the specification whether a particular implementation can produce some sort of optimization. The specification must be independent of implementations.

Can you describe what you want without referring to any aspect of implementation?

@ruifortes
Copy link
Author

ruifortes commented Mar 25, 2017

The specifications just has to assume circular $ref exist.
The following should be a valid schema

{
    "type": "object",
    "additionalProperties": {
        "$ref": "#"
    }
}

If some programing language can't figure out how to implemented that's a problem with that language not with json-schema. The aforementioned schema should be considered completly valid.

Has this one

    person: {
      properties: {
        name: {
          type: 'string'
        },
        spouse: {
          type: {
            $ref: '#'        // circular reference to parent
          }
        }
      }
    }

This is what I thing is the best approach. I'm not just trying to defend my implementation.
Actually I only did it because I think it's the best approach and others didn't seam to dereference recursively.
Maybe some do but they seamed overcomplex and I just couldn't understand how

@handrews
Copy link
Contributor

@ruifortes that is a valid schema. What change are you trying to accomplish?

@epoberezkin
Copy link
Member

epoberezkin commented Mar 25, 2017

The specifications just has to assume circular $ref exist.

It assumes that. It just cannot be dereferenced within the spec. JSON-schema is defined in such way that it can be implemented in any language, recursive data structures are not required to implement $refs, neither compiling schemas to functions - there are many ways implementations can follow the spec using constructs available in their languages. Spec has nothing to do with these details though.

@ruifortes
Copy link
Author

ruifortes commented Mar 25, 2017

Ok...so we agree. Why not just assume that $ref is a general JSON issue. I'm just saying that json-schema spec shouldn't worry about that. It should just assume $ref works has a general referencing mechanism for JSON and just don't make any assumptions about it.
The only thing I agree that json-schema should define is the syntactic shortcut using extra $ref props to override metadata. And that's just for simplicity and readability sake.

@handrews
Copy link
Contributor

What benefit does that provide?

@ruifortes
Copy link
Author

ruifortes commented Mar 25, 2017

The benefit is that all the issues regarding $ref, $merge, $use, $extend, path, etc etc are out.
Json-schema shouldn't be trying to enforce conformity regarding these methods so that's why I say that is should view the data structure as fully dereferenced although we know that it is not possible to serialize it in json conceptually it doesn't make any difference.
I don't think the spec should be trying to guarantee that dereferencing is done according to the conventions. Let a linter do that.
I wouldn't like to read a schema full of $merge's just to override point of use annotations nor to add another keyword ($use) just to ensure that $merge conformity.
Also I'm a little septical about "allOf" to extend schemas. Just use $merge and let some linter check its correctness.
I'm all in for a keyword that simplifies code but "allOf" seams to restrictive and there are real strick use cases where it can be used like it's siblings "oneOf", "anyOf".
I would like for some $extend mechanism if it solved some kind of verbosity or boilerplate

@epoberezkin
Copy link
Member

The reasons not to see $ref as JSON issue:

  • JSON-schema cannot exist without $ref
  • $ref is not needed anywhere outside of JSON-schema

So while it is possible to create a separate spec for $id, $ref etc. (and actually that's how it was before) it just makes more sense to have $ref in JSON schema. It is already separated from validation by being in the core rather than in the validation doc.

$merge/$use, on another hand, can be outside of spec as it is a pre-processing.

should view the data structure as fully dereferenced although we know that it is not possible to serialize it in json conceptually it doesn't make any difference.

How can it "view" data structure as fully dereferenced if it is not possible to fully dereference it? In which terms would it explain such a structure? As I wrote - there is no acceptable standard existing to allow it, and using JavaScript or XML is not acceptable. The only acceptable structure for JSON-schema is JSON - that's the nature of the spec, so $ref should stay as part of the JSON-schema spec.

@ruifortes
Copy link
Author

ruifortes commented Mar 25, 2017

json-schema cannot exist without json also.
json as a serializable text format must have some mecanism to reference other json documents or parts of it.
Json can be used anywhere you need to serialize a data structure and must contemplate references.
$ref is essential to json as a serializable format.
Even if nobody is using json to serialize data that's doesn't mean that $ref shouldn't be contemplated has a general json referencing mechanism.
If json-schema can't exist without json and json without $ref you just need to tie json-schema to json
Of course I also thing the $merge is a generic json operation and has nothing to do with json-schema.
$use of course would have to be defined by json-schema spec I just don't think json-schema should be enforcing conformity in merging operations.
I think new keywords are useful to provide functionally that simplify the syntax not the other way arround.

And of course it is possible to "conceptually" view data completly dereferenced. Serialization is just an implementation detail and that's the only problem. json-schema just cannot view it's data has having functions or any other type json doesn't support but $ref's should be seen as real references as in JS or any other language that has them. If the language doesn't have that it's is for the implementation to find workarrounds.

@handrews
Copy link
Contributor

json as a serializable text format must have some mecanism to reference other json documents or parts of it.

No, it's purely a data format and not hypermedia. That's why JSON Hyper-Schema exists.

It is certainly possible to define JSON Reference separately, and that was done in the past, but it made things more complicated rather than less. JSON Schema only allows it in certain places, which makes it much easier to reason about and makes it possible for JSON Schema to describe itself. Once JSON Schema needs to put usage restrictions on JSON Reference, coordinating the two becomes more of an implementation burden- you can't just use JSON Reference blindly, so JSON Schema implementations must be aware of it whether it is part of that standard or not.

Json can be used anywhere you need to serialize a data structure and must contemplate references.
$ref is essential to json as a serializable format.

There really hasn't been interest in JSON Reference as a separate thing. The IETF draft expired years ago and despite being quite simple, no one has picked it up. So I'd say it's really not essential in the view of most people. Arguably JSON-LD is a better approach for linking JSON data anyway (LD == Linked Data).

@epoberezkin
Copy link
Member

A specific argument against $ref as JSON extension were schemas like this one, where property name in object is not JSON-schema keyword:

{
  "properties": {
    "$ref": {
      "type": "string"
    }
  }
}

In this case $ref is just a property name, nothing special. So "$ref" has only the meaning of reference when used as a property of schema object.

If $ref were to be defined for JSON this distinction would not be possible.

@Relequestual
Copy link
Member

@ruifortes If you want to write your own specification in separation which defines $ref to be whatever you wan, then go ahead. That is not something we are interested in doing, for the reasons we have explained.

@json-schema-org json-schema-org locked and limited conversation to collaborators Mar 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants