Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the value space of rdf:JSON and xsd:boolean disjoint? #323

Closed
LEW21 opened this issue Jan 14, 2020 · 17 comments
Closed

Is the value space of rdf:JSON and xsd:boolean disjoint? #323

LEW21 opened this issue Jan 14, 2020 · 17 comments

Comments

@LEW21
Copy link

LEW21 commented Jan 14, 2020

I'd ask about xsd:strings and xsd:double too, but I'm not even sure if their definitions in ECMAScript are equivalent to the XSD ones.

xsd:boolean:

3.3.2.1 Value Space
boolean has the ·value space· of two-valued logic: {true, false}.

ECMAScript boolean:

6.1.3The Boolean Type
The Boolean type represents a logical entity having two values, called true and false.

RDF:JSON:

The value space
is the union of the four primitive types (strings, numbers, booleans, and null) and two structured types (objects and arrays) from [ECMASCRIPT]. Two JSON values A and B are considered equal if and only if the following is true:

  1. If A and B are both objects, (...)
  2. Otherwise, if A and B are both arrays, (...)
  3. Otherwise, if A and B satisfy the Strict Equality Comparison defined in Section 7.2.15 in [ECMASCRIPT].
  4. Otherwise, A and B are not equal.

If I were to guess, I'd guess they are disjoint, because that would be similar to the handling of xsd:hexBinary and xsd:base64binary. On the other hand - the definitions look equal, and nothing says explicitely that they should be disjoint.

@pchampin
Copy link
Contributor

From a very pedantic point of view, nothing in XSD's and Ecmascript's respective definitions formally asserts that they are referring to the same thing, even if they both call them "true" and "false". That being said, I would tend to agree with you that the intention is to refer to the same thing, and so that "true"^^xsd:boolean and "true"^^rdf:json denote the same value.

I'd guess they are disjoint, because that would be similar to the handling of xsd:hexBinary and xsd:base64binary

What do you mean by "handling"? JSON-LD processors are not concerned about the semantics of literals, and they are never expected to treat two literals with different datatypes as if they were the same, even if they happen to denote the same value (as "true"^^xsd:boolean and "true"^^rdf:json, or "ff"^^xsd:hexBinary and "FF"^^xsd:hexBinary for that matter).

@LEW21
Copy link
Author

LEW21 commented Jan 14, 2020

I'd guess they are disjoint, because that would be similar to the handling of xsd:hexBinary and xsd:base64binary

What do you mean by "handling"? JSON-LD processors are not concerned about the semantics of literals, and they are never expected to treat two literals with different datatypes as if they were the same, even if they happen to denote the same value (as "true"^^xsd:boolean and "true"^^rdf:json, or "ff"^^xsd:hexBinary and "FF"^^xsd:hexBinary for that matter).

I mean that xsd:hexBinary and xsd:base64Binary are defined in the XSD spec as disjoint:

For purposes of this specification, the value spaces of primitive datatypes are disjoint, even in cases where the abstractions they represent might be thought of as having values in common.

This has further effects on OWL processing:

According to XML Schema, the value spaces of xsd:hexBinary and xsd:base64Binary are isomorphic copies of the set of all finite sequences of octets — integers between 0 and 255, inclusive. To understand the effect that the disjointness requirement has on the semantics of OWL 2, consider the following example ontology:

  • DataPropertyRange( a:personID xsd:base64Binary ) # The range of the a:personID property is xsd:base64Binary.
  • DataPropertyAssertion( a:personID a:Meg "0203"^^xsd:hexBinary ) # The ID of Meg is the octet sequence consisting of the octets 2 and 3.

The first axiom states that all values of the a:personID property must be in the value space of xsd:base64Binary, but the second axiom provides a value for a:personID that is in the value space of xsd:hexBinary. Since the value spaces of xsd:hexBinary and xsd:base64Binary are disjoint, the above ontology is inconsistent.

So, in practical terms, if "true"^^xsd:boolean and "true"^^rdf:JSON are the same true value, then:

  • one can use "true"^^xsd:boolean in every place where rdf:JSON is expected,
  • even if property :x is defined as a owl:FunctionalProperty, you can assert both _:a :x "true"^^xsd:boolean and _:a :x "true"^^rdf:JSON, and this will be consistent.

@gkellogg
Copy link
Member

I think it's a bit different than asking about the value space of "1^^xsd:integer and "1"^^xsd:decimal, as those are clearly numbers. In the case of "1"^^rdf:JSON, it is a JSON value, which may be interpreted as a number, certainly when parsed by a JSON parser. The fact that it could be interpreted so directly is something of a special case.

The point of the rdf:JSON datatype is to represent JSON values, much as rdf:HTML can represent HTML values. Because rdf:HTML represents DOM fragments, such a fragment could be "1", would this imply that the literal "1"^^rdfHTML should be considered to be the same value? I think not.

@pchampin
Copy link
Contributor

This has further effects on OWL processing

Depending on which datatype map you are using, yes. But not all OWL processor are expected to support all possible datatype.

I agree that this may open a pandora box, though. And in fact, JSON is not defined as a serialization of Ecmascript, so the denotation of "true"^^rdf:json is not bound to be interpreted as Ecmascript's true. I think this is the path suggested by @gkellogg, and I tend to agree.

@LEW21
Copy link
Author

LEW21 commented Jan 14, 2020

The point of the rdf:JSON datatype is to represent JSON values, much as rdf:HTML can represent HTML values. Because rdf:HTML represents DOM fragments, such a fragment could be "1", would this imply that the literal "1"^^rdfHTML should be considered to be the same value? I think not.

1 in HTML is just a string, so it's a different value than a number.

OTOH, RDF spec says:

RDF applications may use additional equivalence relations, such as that which relates an xsd:string with an rdf:HTML literal corresponding to a single text node of the same string.

Still, equivalence is not identity (for example "-0"^^xsd:float and "+0"^^xsd:float are equal, but not identical, and are therefore treated as different values in OWL), and the rdf:HTML value space is defined in terms of DOM DocumentFragments, so I'm pretty sure 'a'^^rdf:HTML is a different value than 'a'^^xsd:string.

@LEW21
Copy link
Author

LEW21 commented Jan 14, 2020

I think it's a bit different than asking about the value space of "1^^xsd:integer and "1"^^xsd:decimal, as those are clearly numbers. In the case of "1"^^rdf:JSON, it is a JSON value, which may be interpreted as a number, certainly when parsed by a JSON parser. The fact that it could be interpreted so directly is something of a special case.

The spec says that the value of "1"^^rdf:JSON is a number (the ECMAScript number), so it's not just something that can be interpreted this way.

@LEW21
Copy link
Author

LEW21 commented Jan 14, 2020

I agree that this may open a pandora box, though. And in fact, JSON is not defined as a serialization of Ecmascript, so the denotation of "true"^^rdf:json is not bound to be interpreted as Ecmascript's true. I think this is the path suggested by @gkellogg, and I tend to agree.

Unfortunately, JSON-LD's spec references the JS spec here:

The value space
is the union of the four primitive types (strings, numbers, booleans, and null) and two structured types (objects and arrays) from [ECMASCRIPT].

I'm not a fan of this definition (as the standard JSON does not have any formal ties to ECMAScript), but it's there, and rdf:JSON is bound to both standard JSON (for the syntax) and the ECMAScript interpretation (for the value).

@pchampin
Copy link
Contributor

Unfortunately, JSON-LD's spec references the JS spec here

Oh my, you are right.

I'm not a fan of this definition (as the standard JSON does not have any formal ties to ECMAScript)

Well, the standard JSON is only concerned about syntax, hence the lexical space. We need to go beyond it to define the value space...

@iherman
Copy link
Member

iherman commented Jan 17, 2020

My feeling is that what is in the current spec is indeed erroneous, mixing the datatypes...

rdf:HTML defines the value space in terms of the DOM, because the DOM, well, exists. As there is nothing comparable for JSON, the only clean way of doing it is that the value space for rdf:JSON are strings that abide to the requirements of the JSON syntax. It looks like a circular definition, but it is not really; we clearly define what the equivalence is for those strings and that is all that, in my view, RDF can do...

@LEW21
Copy link
Author

LEW21 commented Jan 17, 2020

My feeling is that what is in the current spec is indeed erroneous, mixing the datatypes...

rdf:HTML defines the value space in terms of the DOM, because the DOM, well, exists. As there is nothing comparable for JSON, the only clean way of doing it is that the value space for rdf:JSON are strings that abide to the requirements of the JSON syntax. It looks like a circular definition, but it is not really; we clearly define what the equivalence is for those strings and that is all that, in my view, RDF can do...

This sounds cool, but this way differently formatted equivalent JSON documents would be treated as a different values - which is a problem that JSON canonicalization is trying to solve. Still, right now the JSON canonicalization is just a draft, and other specs - like JWT - don't really care about it, they just treat the whole JSON object as a string, with formatting and all included.

@iherman
Copy link
Member

iherman commented Jan 18, 2020

This issue was discussed in a meeting.

  • RESOLVED: Change the definition of the value space of rdf:JSON datatype to remove the reference to the internal datatypes and replace with a string which is the canonical form of the serialization of the JSON
  • RESOLVED: Above resolution is a significant change, and that we will need to re-start CR. Re-publication would when current commenters issues have been addressed.
View the transcript Syntax #323
Rob Sanderson: link: #323
Ivan Herman: This is really pchampin’s area, but there is a problem with how we define the JSON datatype (value space) is that we convert the JSON text into a datatypes defined in XSD and are used in RDF.
… There are some funny situations that come up, where for example “1”^^rdf:JSON and “1”^^xsd:integer are the same, which is wrong.
… THe rdf:HTML datatype uses the DOM, which doesn’t revert to primitive types. We don’t have anything like that in JSON land.
… I think the only correct way to do it is to say that the value space is a JSON string. This seems odd, but really it’s just a JSON string per the spec.
… Then we have a way to compare two values, as defined by JSON. We do indeed have a mixture of data in JSON for which we have no control.
… I think that such a change might be substantial and we’d need to go through a more formal CR update.
Pierre-Antoine Champin: I agree with ivan; I’m not sure we can take your solution, as the JSON RFC may not describe the semantics of equality.
Ivan Herman: We do have in our space a way to compare two JSON literals.
Pierre-Antoine Champin: We had to do this because the RFC doesn’t. The most natural thing was to refer to other types. Hence the confusion.
Ivan Herman: The comparison process goes into converting to the internal representation for comparison. Otherwise, the value space is a string.
Rob Sanderson: The question is whether “1”^^xsd:integer === “1”^^rdf:json ?
Ivan Herman: The way we define the value space is sort of a union of integers, booleans and and so forth. That’s wrong.
Rob Sanderson: There is no expectation that the syntax of the JSON is comparable to the semantics of other datatypes. You don’t need to understand that “true” is a valid boolean in JSON to implement the rdf:JSON datatype.
Pierre-Antoine Champin: I didn’t think it was “wrong”, but it opens a Pandora’s block regarding other specs. It could be expected that OWL processors recognize this value, which could be a problem.
… There might be issues with numbers which might make it actually wrong, as numbers are not very clear.
Dave Longley: +1
Rob Sanderson: +1 to Ivan
Dave Longley: +1 rdf:JSON is opaque
Benjamin Young: +1
Ivan Herman: The whole reason we went into this is because we want some portion of the JSON-LD to be opaque for RDF processing; our intention was that, but by turning it into an active datatype, we do something that wasn’t intended.
Gregg Kellogg: I’ll echo ivan’s point. Point was not to make a literal that should be compared with other things, but is simply JSON
Dave Longley: just a way for JSON to travel in RDF
Gregg Kellogg: because of normalization, which is not normative, every processor would create the same literal representation. Should have said that the value space is the string representing that serialization
… and to compare as strings.
Dave Longley: +1 to ivan
Ivan Herman: In an ideal world where there was a canonical JSON spec, we could say that the value space is a canonical version of the JSON serialization.
Dave Longley: +1 to “value space is a canonical version of the JSON serialization”
Ivan Herman: We do define a canonicalization.
… Then we could be a bit more precise based on our C14N description.
Proposed resolution: Change the definition of the value space of rdf:JSON datatype to remove the reference to the internal datatypes and replace with a string which is the canonical form of the serialization of the JSON (Rob Sanderson)
Dave Longley: Seems like the right thing to do to me.
Dave Longley: +1
Pierre-Antoine Champin: +1
Ivan Herman: +1
Harold Solbrig: +1
Gregg Kellogg: +1
Tim Cole: +1
Rob Sanderson: +1
Benjamin Young: +1
Resolution #2: Change the definition of the value space of rdf:JSON datatype to remove the reference to the internal datatypes and replace with a string which is the canonical form of the serialization of the JSON
Rob Sanderson: This is normative text that changes how the datatype is defined/interpreted, I’d have a hard time arguing that it’s an editorial issue.
Pierre-Antoine Champin: As we’re nit-picking, we should not define it as a subset of xsd:string representations, but as a separate value space.
Rob Sanderson: +1
Ivan Herman: I wasn’t suggesting we republish now.
Rob Sanderson: But, we do think it’s significant enough to re-enter CR.
Ivan Herman: What gkellogg said, when kasei has “finished”, we can do a republication.
Proposed resolution: Above resolution is a significant change, and that we will need to re-start CR. Re-publication would when current commenters issues have been addressed. (Rob Sanderson)
Rob Sanderson: +1
Pierre-Antoine Champin: +1
Ivan Herman: +1
Dave Longley: +1
Rob Sanderson: kasei is using PERL to start with, and will port to a more modern language later.
Benjamin Young: +1
Gregg Kellogg: +1
Tim Cole: +1
Harold Solbrig: +1
Resolution #3: Above resolution is a significant change, and that we will need to re-start CR. Re-publication would when current commenters issues have been addressed.
Ivan Herman: Is there a minimum period of implementation? Do we have to delay based on that limitation?
Dave Longley: i think you can do a really short CR, especially if changes are not significant
Rob Sanderson: From process 2019:
Rob Sanderson: > Revising a Candidate Recommendation … must specify the deadline for further comments, which must be at least 28 days after publication, and should be longer for complex documents,
Rob Sanderson: https://www.w3.org/2019/Process-20190301/#revised-cr
Ivan Herman: 28 days is not that bad. Our original deadline was end-of-February.
… We’d need to postpone the deadline until mid-March if we miss that.
… We’re still well within our window.
… We could, in theory, republish syntax without API, but probably best until review is finished.
Rob Sanderson: We can probably consolidate our changes into something simpler.
Ivan Herman: Let’s make them clean.

@iherman
Copy link
Member

iherman commented Jan 18, 2020

@LEW21 see the comment above (or the meeting minutes). The current spec, in effect, does include a definition of canonicalization, so the value space being the set of canonicalized JSON texts seems to be fine.

@LEW21
Copy link
Author

LEW21 commented Jan 19, 2020

I agree, that it's better to use canonicalized JSON instead of JS object model as the value space here.

However, I think treating the strings as-written might be even better:

1. JSC is a draft

JSON Canonicalization Scheme (JSC) is still a draft, so it can't be referenced normatively. It's still getting a new version every month. While I'm not proficient in understanding the IETF process, it doesn't look like it's going to become a standard soon.

2. JSC - Serialization of Numbers

JSC relies on the JavaScript implementation of Numbers. This is not based on the JSON standard, but their own decision - as the JSON standard does not enforce, or even recommend, any particular implementation of Numbers. And there are people that use JSON with non-JS-compatible Numbers. It's not a good practice, but as long as it's not disallowed by the JSON spec, it should be supported.

3. JSC - Sorting of Object Properties

JSC specifies that object properties have to be sorted - because JSON says they are unordered. In practice, there is a growing number of people who are using them in a ordered way.

  • While undefined behavior in ECMAScript, in the popular web browser JS engines - objects are ordered - and this order is preserved when using JSON.parse and JSON.stringify.
  • Since Python 3.6, dicts are ordered; and since 3.7 this is a part of Python standard - and both json.load and json.dump preserve this order.
  • People writing web APIs commonly return their objects with keys sorted in the "most readable way".
  • There are specs, like JSON Schema and OpenAPI, that use JSON objects to store a map of properties, which - while functionally unordered - is interpreted in an ordered way by documentation generators (so that schema authors can sort the properties in the most readable way, and have the documentation follow this order).
  • There are probably also other cases where people need ordered dicts, and serialize/unserialize them in JSON as objects, without even knowing that it's undefined behavior - because it just works, and seems natural.

I think that at some point the JSON spec will have to be amended to acknowledge this practice.

4. Compatibility with JWT

JWT was standarized before the JSC work has started. It says that:

This JSON object MAY contain whitespace and/or line breaks before or after any JSON values or structural characters, in accordance with Section 2 of RFC 7159 [RFC7159].

So it simply preserves whatever was thrown at it by the user. This "whatever" is then signed. So - if somebody would want to store both the JSON payload and the signature in an RDF database, in separate properties - he would need the payload to remain intact. He would probably like to tag the payload as JSON, but - if rdf:JSON depends on canonicalization - that would break in some cases (unfortunately not all, so it's possible he wouldn't even notice it until it gets on production).


Still, canonicalization of course has its uses. It's nice to be able to parse the object, and then serialize it, and have the result be the same value. So - it might be a good idea to recommend using the canonical form (or at least a whitespace-less form, because these are discarded by all the JSON parsers+serializers) in the lexical space. While such recommendation wouldn't work for example for numbers, which are commonly written by hand - JSON values are usually produced by software, so the generators could be programmed to generate as-canonical-as-possible output. And - if somebody needs something that's not supported by JCS, or some other way to canonicalize JSON is standarized in the future - you're safe, everything still works.

@iherman
Copy link
Member

iherman commented Jan 19, 2020

@LEW21,

JSON Canonicalization Scheme (JSC) is still a draft, so it can't be referenced normatively. It's still getting a new version every month. While I'm not proficient in understanding the IETF process, it doesn't look like it's going to become a standard soon.

You are absolutely right. It is a draft, who knows where it goes, and we cannot rely on it in the spec. As I said, we were forced to use our own definition which is in the spec, see the entry on "The canonical mapping". See also the note after this: if (and when...) the JSC is indeed a standard, future versions of this spec may be adapted. But, at this moment, this WG has no other choice than to has its own canonicalization rules.

@iherman
Copy link
Member

iherman commented Jan 24, 2020

This issue was discussed in a meeting.

  • RESOLVED: Update api document to be in line with syntax for json datatype, test descriptions, and “canonicalization” algorithm (modulo key ordering)
View the transcript Boolean comparison issue (JSON Datatype)
link: #323
Rob Sanderson: Last week, we concluded that we should fix 323.
Pierre-Antoine Champin: The value of the JSON value type should not be a structured representation of JS object, but canonical form of JSON representation.
… We have our own canonic process. But this was marked as non-normative. I think this should be marked as normative.
Ivan Herman: Yes, I agree.
… You avoided canonicalization term, which is a good idea.
pr: #325
… There is a small part that needs to be changed.
Pierre-Antoine Champin: I did change it.
Ivan Herman: I may have missed something then.
Pierre-Antoine Champin: Currently lexical value should be re-serialized.
Gregg Kellogg: Reason it was non-normative was JSC was still in draft.
… Object keys are ordered by converting them by UTF16 may be controversial.
Pierre-Antoine Champin: We should update the API doc as a copy of the normalization text in processing document.
Gregg Kellogg: We should also change the test descriptions.
Ivan Herman: Also the API doc currently repeats the canonicalization steps, and we should refer to the proper place.
… We will get a similar situation as with language tags, where we can not fully guarantee roundtripping.
Gregg Kellogg: Yes. Ordering of keys in our case is just lexicographical, while JSC is much more detailed with localization concerns.
Proposed resolution: Update api document to be in line with syntax for json datatype, test descriptions, and “canonicalization” algorithm (Rob Sanderson)
Gregg Kellogg: Not including that in syntax doc would not be sufficient for interoperation.
Gregg Kellogg: +1 (modulo key ordering)
Pierre-Antoine Champin: +1
Gregg Kellogg: modulo key ordering
Dave Longley: +1 modulo gregg’s comments
Rob Sanderson: +1
Ruben Taelman: +1
Harold Solbrig: +1
Adam Soroka: +1
Benjamin Young: +1 modulo gregg’s comments
Ivan Herman: +1
Resolution #4: Update api document to be in line with syntax for json datatype, test descriptions, and “canonicalization” algorithm (modulo key ordering)
Dave Longley: (it is important to match JCS … and hope it sticks in the future)

@gkellogg
Copy link
Member

@LEW21 can you please indicate if this satisfies your concern?

@twistos
Copy link

twistos commented Sep 1, 2021

CO SIE KURWY ZAMKNELISCIE IC WAM TO NIE DA BDZIECIE JEBANI !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants