-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the value space of rdf:JSON and xsd:boolean disjoint? #323
Comments
From a very pedantic point of view, nothing in XSD's and Ecmascript's respective definitions formally asserts that they are referring to the same thing, even if they both call them "true" and "false". That being said, I would tend to agree with you that the intention is to refer to the same thing, and so that
What do you mean by "handling"? JSON-LD processors are not concerned about the semantics of literals, and they are never expected to treat two literals with different datatypes as if they were the same, even if they happen to denote the same value (as |
I mean that xsd:hexBinary and xsd:base64Binary are defined in the XSD spec as disjoint:
This has further effects on OWL processing:
So, in practical terms, if
|
I think it's a bit different than asking about the value space of The point of the rdf:JSON datatype is to represent JSON values, much as rdf:HTML can represent HTML values. Because rdf:HTML represents DOM fragments, such a fragment could be "1", would this imply that the literal |
Depending on which datatype map you are using, yes. But not all OWL processor are expected to support all possible datatype. I agree that this may open a pandora box, though. And in fact, JSON is not defined as a serialization of Ecmascript, so the denotation of |
1 in HTML is just a string, so it's a different value than a number. OTOH, RDF spec says:
Still, equivalence is not identity (for example |
The spec says that the value of |
Unfortunately, JSON-LD's spec references the JS spec here:
I'm not a fan of this definition (as the standard JSON does not have any formal ties to ECMAScript), but it's there, and rdf:JSON is bound to both standard JSON (for the syntax) and the ECMAScript interpretation (for the value). |
Oh my, you are right.
Well, the standard JSON is only concerned about syntax, hence the lexical space. We need to go beyond it to define the value space... |
My feeling is that what is in the current spec is indeed erroneous, mixing the datatypes... rdf:HTML defines the value space in terms of the DOM, because the DOM, well, exists. As there is nothing comparable for JSON, the only clean way of doing it is that the value space for rdf:JSON are strings that abide to the requirements of the JSON syntax. It looks like a circular definition, but it is not really; we clearly define what the equivalence is for those strings and that is all that, in my view, RDF can do... |
This sounds cool, but this way differently formatted equivalent JSON documents would be treated as a different values - which is a problem that JSON canonicalization is trying to solve. Still, right now the JSON canonicalization is just a draft, and other specs - like JWT - don't really care about it, they just treat the whole JSON object as a string, with formatting and all included. |
This issue was discussed in a meeting.
View the transcriptSyntax #323Rob Sanderson: link: #323 Ivan Herman: This is really pchampin’s area, but there is a problem with how we define the JSON datatype (value space) is that we convert the JSON text into a datatypes defined in XSD and are used in RDF. … There are some funny situations that come up, where for example “1”^^rdf:JSON and “1”^^xsd:integer are the same, which is wrong. … THe rdf:HTML datatype uses the DOM, which doesn’t revert to primitive types. We don’t have anything like that in JSON land. … I think the only correct way to do it is to say that the value space is a JSON string. This seems odd, but really it’s just a JSON string per the spec. … Then we have a way to compare two values, as defined by JSON. We do indeed have a mixture of data in JSON for which we have no control. … I think that such a change might be substantial and we’d need to go through a more formal CR update. Pierre-Antoine Champin: I agree with ivan; I’m not sure we can take your solution, as the JSON RFC may not describe the semantics of equality. Ivan Herman: We do have in our space a way to compare two JSON literals. Pierre-Antoine Champin: We had to do this because the RFC doesn’t. The most natural thing was to refer to other types. Hence the confusion. Ivan Herman: The comparison process goes into converting to the internal representation for comparison. Otherwise, the value space is a string. Rob Sanderson: The question is whether “1”^^xsd:integer === “1”^^rdf:json ? Ivan Herman: The way we define the value space is sort of a union of integers, booleans and and so forth. That’s wrong. Rob Sanderson: There is no expectation that the syntax of the JSON is comparable to the semantics of other datatypes. You don’t need to understand that “true” is a valid boolean in JSON to implement the rdf:JSON datatype. Pierre-Antoine Champin: I didn’t think it was “wrong”, but it opens a Pandora’s block regarding other specs. It could be expected that OWL processors recognize this value, which could be a problem. … There might be issues with numbers which might make it actually wrong, as numbers are not very clear. Dave Longley: +1 Rob Sanderson: +1 to Ivan Dave Longley: +1 rdf:JSON is opaque Benjamin Young: +1 Ivan Herman: The whole reason we went into this is because we want some portion of the JSON-LD to be opaque for RDF processing; our intention was that, but by turning it into an active datatype, we do something that wasn’t intended. Gregg Kellogg: I’ll echo ivan’s point. Point was not to make a literal that should be compared with other things, but is simply JSON Dave Longley: just a way for JSON to travel in RDF Gregg Kellogg: because of normalization, which is not normative, every processor would create the same literal representation. Should have said that the value space is the string representing that serialization … and to compare as strings. Dave Longley: +1 to ivan Ivan Herman: In an ideal world where there was a canonical JSON spec, we could say that the value space is a canonical version of the JSON serialization. Dave Longley: +1 to “value space is a canonical version of the JSON serialization” Ivan Herman: We do define a canonicalization. … Then we could be a bit more precise based on our C14N description. Proposed resolution: Change the definition of the value space of rdf:JSON datatype to remove the reference to the internal datatypes and replace with a string which is the canonical form of the serialization of the JSON (Rob Sanderson) Dave Longley: Seems like the right thing to do to me. Dave Longley: +1 Pierre-Antoine Champin: +1 Ivan Herman: +1 Harold Solbrig: +1 Gregg Kellogg: +1 Tim Cole: +1 Rob Sanderson: +1 Benjamin Young: +1 Resolution #2: Change the definition of the value space of rdf:JSON datatype to remove the reference to the internal datatypes and replace with a string which is the canonical form of the serialization of the JSON Rob Sanderson: This is normative text that changes how the datatype is defined/interpreted, I’d have a hard time arguing that it’s an editorial issue. Pierre-Antoine Champin: As we’re nit-picking, we should not define it as a subset of xsd:string representations, but as a separate value space. Rob Sanderson: +1 Ivan Herman: I wasn’t suggesting we republish now. Rob Sanderson: But, we do think it’s significant enough to re-enter CR. Ivan Herman: What gkellogg said, when kasei has “finished”, we can do a republication. Proposed resolution: Above resolution is a significant change, and that we will need to re-start CR. Re-publication would when current commenters issues have been addressed. (Rob Sanderson) Rob Sanderson: +1 Pierre-Antoine Champin: +1 Ivan Herman: +1 Dave Longley: +1 Rob Sanderson: kasei is using PERL to start with, and will port to a more modern language later. Benjamin Young: +1 Gregg Kellogg: +1 Tim Cole: +1 Harold Solbrig: +1 Resolution #3: Above resolution is a significant change, and that we will need to re-start CR. Re-publication would when current commenters issues have been addressed. Ivan Herman: Is there a minimum period of implementation? Do we have to delay based on that limitation? Dave Longley: i think you can do a really short CR, especially if changes are not significant Rob Sanderson: From process 2019: Rob Sanderson: > Revising a Candidate Recommendation … must specify the deadline for further comments, which must be at least 28 days after publication, and should be longer for complex documents, Rob Sanderson: https://www.w3.org/2019/Process-20190301/#revised-cr Ivan Herman: 28 days is not that bad. Our original deadline was end-of-February. … We’d need to postpone the deadline until mid-March if we miss that. … We’re still well within our window. … We could, in theory, republish syntax without API, but probably best until review is finished. Rob Sanderson: We can probably consolidate our changes into something simpler. Ivan Herman: Let’s make them clean. |
@LEW21 see the comment above (or the meeting minutes). The current spec, in effect, does include a definition of canonicalization, so the value space being the set of canonicalized JSON texts seems to be fine. |
I agree, that it's better to use canonicalized JSON instead of JS object model as the value space here. However, I think treating the strings as-written might be even better: 1. JSC is a draftJSON Canonicalization Scheme (JSC) is still a draft, so it can't be referenced normatively. It's still getting a new version every month. While I'm not proficient in understanding the IETF process, it doesn't look like it's going to become a standard soon. 2. JSC - Serialization of NumbersJSC relies on the JavaScript implementation of Numbers. This is not based on the JSON standard, but their own decision - as the JSON standard does not enforce, or even recommend, any particular implementation of Numbers. And there are people that use JSON with non-JS-compatible Numbers. It's not a good practice, but as long as it's not disallowed by the JSON spec, it should be supported. 3. JSC - Sorting of Object PropertiesJSC specifies that object properties have to be sorted - because JSON says they are unordered. In practice, there is a growing number of people who are using them in a ordered way.
I think that at some point the JSON spec will have to be amended to acknowledge this practice. 4. Compatibility with JWTJWT was standarized before the JSC work has started. It says that:
So it simply preserves whatever was thrown at it by the user. This "whatever" is then signed. So - if somebody would want to store both the JSON payload and the signature in an RDF database, in separate properties - he would need the payload to remain intact. He would probably like to tag the payload as JSON, but - if rdf:JSON depends on canonicalization - that would break in some cases (unfortunately not all, so it's possible he wouldn't even notice it until it gets on production). Still, canonicalization of course has its uses. It's nice to be able to parse the object, and then serialize it, and have the result be the same value. So - it might be a good idea to recommend using the canonical form (or at least a whitespace-less form, because these are discarded by all the JSON parsers+serializers) in the lexical space. While such recommendation wouldn't work for example for numbers, which are commonly written by hand - JSON values are usually produced by software, so the generators could be programmed to generate as-canonical-as-possible output. And - if somebody needs something that's not supported by JCS, or some other way to canonicalize JSON is standarized in the future - you're safe, everything still works. |
You are absolutely right. It is a draft, who knows where it goes, and we cannot rely on it in the spec. As I said, we were forced to use our own definition which is in the spec, see the entry on "The canonical mapping". See also the note after this: if (and when...) the JSC is indeed a standard, future versions of this spec may be adapted. But, at this moment, this WG has no other choice than to has its own canonicalization rules. |
This issue was discussed in a meeting.
View the transcriptBoolean comparison issue (JSON Datatype)link: #323 Rob Sanderson: Last week, we concluded that we should fix 323. Pierre-Antoine Champin: The value of the JSON value type should not be a structured representation of JS object, but canonical form of JSON representation. … We have our own canonic process. But this was marked as non-normative. I think this should be marked as normative. Ivan Herman: Yes, I agree. … You avoided canonicalization term, which is a good idea. pr: #325 … There is a small part that needs to be changed. Pierre-Antoine Champin: I did change it. Ivan Herman: I may have missed something then. Pierre-Antoine Champin: Currently lexical value should be re-serialized. Gregg Kellogg: Reason it was non-normative was JSC was still in draft. … Object keys are ordered by converting them by UTF16 may be controversial. Pierre-Antoine Champin: We should update the API doc as a copy of the normalization text in processing document. Gregg Kellogg: We should also change the test descriptions. Ivan Herman: Also the API doc currently repeats the canonicalization steps, and we should refer to the proper place. … We will get a similar situation as with language tags, where we can not fully guarantee roundtripping. Gregg Kellogg: Yes. Ordering of keys in our case is just lexicographical, while JSC is much more detailed with localization concerns. Proposed resolution: Update api document to be in line with syntax for json datatype, test descriptions, and “canonicalization” algorithm (Rob Sanderson) Gregg Kellogg: Not including that in syntax doc would not be sufficient for interoperation. Gregg Kellogg: +1 (modulo key ordering) Pierre-Antoine Champin: +1 Gregg Kellogg: modulo key ordering Dave Longley: +1 modulo gregg’s comments Rob Sanderson: +1 Ruben Taelman: +1 Harold Solbrig: +1 Adam Soroka: +1 Benjamin Young: +1 modulo gregg’s comments Ivan Herman: +1 Resolution #4: Update api document to be in line with syntax for json datatype, test descriptions, and “canonicalization” algorithm (modulo key ordering) Dave Longley: (it is important to match JCS … and hope it sticks in the future) |
@LEW21 can you please indicate if this satisfies your concern? |
CO SIE KURWY ZAMKNELISCIE IC WAM TO NIE DA BDZIECIE JEBANI !!! |
I'd ask about xsd:strings and xsd:double too, but I'm not even sure if their definitions in ECMAScript are equivalent to the XSD ones.
xsd:boolean:
ECMAScript boolean:
RDF:JSON:
If I were to guess, I'd guess they are disjoint, because that would be similar to the handling of xsd:hexBinary and xsd:base64binary. On the other hand - the definitions look equal, and nothing says explicitely that they should be disjoint.
The text was updated successfully, but these errors were encountered: