Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embedded @context triple lost in toRDF processing #540

Closed
christopher-johnson opened this issue Oct 10, 2017 · 18 comments
Closed

embedded @context triple lost in toRDF processing #540

christopher-johnson opened this issue Oct 10, 2017 · 18 comments
Labels
api spec-design wont fix Issue discussed and closed as Won't Fix

Comments

@christopher-johnson
Copy link

As shown in this gist, embedded @context triples are lost when processing from JSON-LD to RDF.

Is this expected or rather a consequence of the special characteristics of the @context keyword for from JSON-LD processors?

One solution would be to map only the embedded @context(s) to a property IRI that would be recognizable for fromRDF processors to deserialize, similar to @type and rdf:type, or alias a different keyword for this embedded @context functionality that would have a corresponding property IRI.

Possibly relates to #521

@elf-pavlik
Copy link
Contributor

elf-pavlik commented Oct 10, 2017

JSON-LD @context only comes relevant when JSON-LD processor needs to interpret compacted form (see relevant video about compaction and expansion). If you look at the link to playground you provided, expanded version also doesn't include @context information any more, any relevant information found in that @context got already used during expansion. Information in the @context doesn't really belong as part of 'the data', it only serves as information for JSON-LD processor used to interpret 'the data' when processing compacted form. Once we have expanded form of JSON-LD (or N-Quads), it already contains all 'the data' and doesn't need that @context any more to interpret it. If you think for example about Turtle serialization of RDF we can see all the prefixes as something similar, it helps the parser to interpret the 'compact' notation Turtle offers. Once we go to N-Triples we don't need any of that that prefix information any more since we only have expanded IRIs in N-Triples serialization.
HTH

@christopher-johnson
Copy link
Author

christopher-johnson commented Oct 10, 2017

@elf-pavlik thank you for the explanation. In this particular API, the IRI of the @context may be interpreted as a value by clients (who are not required to process the document as JSON-LD) as a method to conditionally assign node types. From what you have indicated, @context should not be expected by a client, since it is a processing keyword used only for JSON-LD processing, and may be altogether absent in the same JSON-LD document with a different processor serialization.

As a corollary, an application/ld+json response Content-Type header should be the primary mechanism for a client to interpret a whether a document is JSON-LD before it deserializes it i.e. a JSON document without @context does not mean that it is ordinary application/json.
ping @azaroth42

@azaroth42
Copy link
Contributor

azaroth42 commented Oct 10, 2017

I agree with Elf's answer, but not your conclusions. A reference to the context document is necessary to understand the mapping between the JSON representation and the triples it encodes. There are two ways to link the context, either with the link header described in section 6.8 or with the @context property within the representation. Clearly, with a different non-JSON-LD serialization there will not be a reference to the context. Thus, a client can expect to see a context for JSON-LD, and for the most part in the @context field (either inline or by reference).

The registered media-type is definitely application/ld+json, which also gives us the profile parameter. However application/json is also acceptable, again per section 6.8 on interpreting JSON as JSON-LD. The context is never a triple in the data, it's an artifact of the serialization.

#521 is just to clarify in the text of the spec that scoped contexts are unique per mapped term, not per URI, and thus two different mappings of the same class with a different JSON term can have two different context definitions. This is a possible solution to (IIIF/api#1195).

Scoped contexts also solve the problem of @context not being in the data, and being impossible to re-inject in the middle of a serialized JSON-LD structure. For example, if a IIIF Manifest embeds a IIIF Image service, the Image API's context can't be added way down in the tree. Thus for the next major version of IIIF (IIIF/api#1192), we look forwards to using this feature from 1.1 (IIIF/api#1198).

@christopher-johnson
Copy link
Author

You understand that the problem we face is that it is not possible to round trip the expected serialization format with an embedded @context toRDF and then back out to JSON-LD. This is not an edge case, and I think that it merits further constraints in the specification on how or if a client should be reading @context at all.

@azaroth42
Copy link
Contributor

Sure. The same triples could be mapped to different JSON by different contexts. It's not a property of the resource, it's a function of the serialization. For ease of use and to ensure that the JSON can be treated as a document, not only an online transaction, it is embedded within the representation.

@christopher-johnson
Copy link
Author

Since application/json is the de facto default Content-Type for your APIs, then the response MUST "specify an IRI to a valid JSON-LD document in an HTTP Link Header [RFC5988] using the http://www.w3.org/ns/json-ld#context link relation" [1] The spec also indicates that only one @context can be provided with this method, so embedded @contexts for services is technically not allowed.

The main focus here is with the service nodes that currently require embedded @context per the specification. An ordinary xhr client can get the @context from the link header in the process of dereferencing the service IRI. And since it is technically required for normal responses, I do not see any benefit in having it in the serialization, especially when clients are incorrectly hard-coding parser logic that depends on it.

[1] https://www.w3.org/TR/json-ld/#interpreting-json-as-json-ld

@davidlehn
Copy link
Member

I'm unclear on what the embedded context is being used for. You say the @context value may be used to conditionally assign node types? Why not explicitly assign types with @type?

If you need to provide custom per-"service" context information the in-progress JSON-LD 1.1 scoped contexts feature might be of use in the future. It would allow you to have a single top-level context document that could provide custom per-@type context information.

@christopher-johnson
Copy link
Author

@davidlehn see this. I completely agree that @type is the proper choice.

Yet, another solid rationale for not including embedded service @context in a document serialization is that it breaks the encapsulation of the service. For an organization with perhaps many services this makes document maintenance a hassle if these services change. The discovery of the service API capabilities by the consumer can be done with HTTP, so the application does not need to provide them explicitly.

@elf-pavlik
Copy link
Contributor

elf-pavlik commented Oct 11, 2017

From what you have indicated, @context should not be expected by a client, since it is a processing keyword used only for JSON-LD flattening, and may be altogether absent in the same JSON-LD document with a different processor serialization.

My apologies, I wrote flattened while I should have write compacted (already edited my initial comment). As we see in https://www.w3.org/TR/json-ld/#iana-considerations content negotiation could include profile parameter. If clients specifies in request Expanded Document Form and server responds with it, expended document will NOT include any @context. If client specifies in request Compacted Document Form and server responds with it, compacted document will include @context. AFAIK client has no influence on what JSON-LD context will get used in compaction, so personally I tend not to rely on it, if the rest of the client application expects a particular JSON structure, I always let client do the compaction (and sometimes framing) and always have client application rely on the JSON-LD context provided to the client directly by the application developer (never a payload of http response).

You understand that the problem we face is that it is not possible to round trip the expected serialization format with an embedded @context toRDF and then back out to JSON-LD.

If you want to get 'back out' JSON-LD compacted with particular context, you have to provide that particular context to the compaction function. The example in the playground seems to use the same @context for the embedded value of "service" but I assume you would like to have possibility of compacting embedded values with a context different than the rest of the document. @davidlehn can correct me if I miss something but I think that for single compaction one can provide just one JSON-LD context. To compact embedded values with different contexts I think one would need to compact them separately and then perform the embedding (no idea if JSON-LD Framing can do that).

@christopher-johnson
Copy link
Author

To reiterate, the primary considerations for JSON/JSON-LD API specifiers on how to define @context use by consumers:

  1. An JSON/JSON-LD API SHOULD specify a default Content-Type.
  2. If a response Content-Type is application/json, and the document MAY be interpreted as JSON-LD, then it MUST provide a @context per RFC5988.
  3. If the API supports Content-Type application/ld+json, and specifies a default of application/json, then it MUST mandate that servers Accept requests for application/ld+json.
  4. If a response Content-Type is application/ld+json, then the document MAY include @context ONLY IF the response profile is defined with http://www.w3.org/ns/json-ld#compacted or http://www.w3.org/ns/json-ld#flattened per RFC6906.

The language of 6.8 "Please note that JSON-LD documents served with the application/ld+json media type must have all context information, including references to external contexts, within the body of the document" is imo, questionable.

@context is provided in JSON-LD output, only when an @context is not present in the source document. Framed output will always include @context if it is present in the frame. For fromRDF serializations, a root @context will always be provided since it cannot exist in the source dataset and must be dereferenced externally. Expanded will never include @context. So, the bottom line that whether or not @context can even be included in the response is predetermined by the profile type and the source document itself.

The gist shows that if a compacted document that has an @context is recompacted, the processor does not include the @context again. Reference this test in JSON-LD java for the expected behaviour

@elf-pavlik
Copy link
Contributor

elf-pavlik commented Oct 11, 2017

The gist shows that if a compacted document that has an @context is recompacted, the processor does not include the @context again.

If you talk about gist from your initial comment, when you select compacted tab in the bottom, you have choice on the right to paste the value for @context or provide URL to external an context and once provided compacted form will include that provided context.

The language of 6.8 "Please note that JSON-LD documents served with the application/ld+json media type must have all context information, including references to external contexts, within the body of the document" is imo, questionable.

Expanded form has no context information and it includes that no information within the body of the document. If document has compacted form (or any of embedded documents undergone compaction), then they have context information and for application/ld+json it MUST get included within the body.

As for API related comments, I think that more general discussion might fit better venue like https://www.w3.org/community/hydra/ IMO JSON based API should define some vnd. media type if it wants to put some requirements on the payloads. For JSON-LD based APIs I would always rely on the expanded form and only use compaction to reduce amounts of bytes send over the wire. Each party participating in data exchange should take responsibility for managing any context and frames it relies on and always start from expanded JSON-LD and take care of all expected compaction and framing. But again that API design specific discussion doesn't seem to belong to JSON-LD core spec and other JSON-LD based specs like Hydra API should address that.

@christopher-johnson
Copy link
Author

christopher-johnson commented Oct 11, 2017

Have you looked at the referenced test? If an @context exists in a document and that document is compacted then the @context will be removed (at least in the jsonld-java implementation). And, this is also what occurs in the playground, so it must also be part of the javascript implementation as well.

I agree that this is not the proper forum to be discussing API related issues. Nonetheless, section 6.8 of the core spec is significant to the general point that I need to make about how @context should be understood and clearly specified, especially when JSON is expected to be interpreted as JSON-LD. The spec does not indicate that expanded form will not include @context, nor does it allow for @context to be absent in any application/ld+json served document which it can be if reprocessing with a null @context input. IMO, this is an oversight.

One additional aspect that you address is related to performance. Including @context in a serialization imposes a processing burden on a JSON-LD deserializer as it must also dereference that document. For example, I attach a debug log from a toRDF deserialization of a document with many embedded @context. For every one, the processor (in this case jsonld-java via Jena RDFParser) makes a separate request.

Thus the principle mechanism where a service is identified with many duplicate @context is also inefficient. I agree with you that expanded form is a good way to transmit documents over the wire since it allows the consumer to implement deserialization more efficiently.

compactionWithManyEmbeddedContextDebug.txt

@elf-pavlik
Copy link
Contributor

elf-pavlik commented Oct 11, 2017

The spec does not indicate that expanded form will not include @context, nor does it allow for @context to be absent in any application/ld+json served document which it can be if reprocessing with a null @context input.

As you quoted, the spec states in https://www.w3.org/TR/json-ld/#interpreting-json-as-json-ld

Please note that JSON-LD documents served with the application/ld+json media type MUST have all context information, including references to external contexts, within the body of the document.

I does NOT state anything like "not allowed for @context to be absent in any application/ld+json". Since expanded JSON-LD doesn not have any 'context information' it does not need to include any @context.

I attach a debug log from a toRDF deserialization of a document with many embedded @context. For every one, the processor (in this case jsonld-java via Jena RDFParser) makes a separate request.

AFAIK none of JSON-LD specs puts any restrictions on caching the external context document, I think implementations should follow standard HTTP caching practices.

Have you looked at the referenced test? If an @context exists in a document and that document is compacted then the@context will be removed (at least in the jsonld-java implementation). And, this is also what occurs in the playground, so it must also be part of the javascript implementation as well.

I don't know Java and find it hard to make sense out of that test. Looking at:

This algorithm compacts a JSON-LD document, such that the given context is applied.
[...]
The final output is a JSON object with an @context key, if a non-empty context was given...

I see that compaction function requires JSON-LD document and the context to use in compaction as separate argument. If provided JSON-LD document had some @context information it gets used during the expansion step, but only the separate context provided as separate argument to the function gets taken into account during compaction.

The gist shows that if a compacted document that has an @context is recompacted, the processor does not include the @context again.

Once again, if the input document has some @context it will get used during expansion step, but the output of compaction will only have a @context if you provide it as separate argument to that function. In a playground on the right side you can find textarea for the inline context or input field for url to an external context.

@christopher-johnson
Copy link
Author

christopher-johnson commented Oct 11, 2017

I see that compaction function requires JSON-LD document and the context to use in compaction as separate argument.

If the @context is removed from the original gist source document, and it is compacted without an input, then a different output is produced as can be seen in this gist. The context processing algorithm must then assume that in compaction, if a document already has an "local context", always merge it into an "active context" with the "remote context", then reprocess the document with the "active context".`[1][2] The processors do not throw if the "remote context" is null, so it is technically not required.

I does NOT state anything like "not allowed for @context to be absent in any application/ld+json". Since expanded JSON-LD doesn not have any 'context information' it does not need to include any @context.

Can an expanded document be served with application/ld+json? If so, then 6.8 is incorrect.

[1] https://github.com/jsonld-java/jsonld-java/blob/master/core/src/main/java/com/github/jsonldjava/core/JsonLdApi.java#L194-L195
[2] https://json-ld.org/spec/latest/json-ld-api/#context-processing-algorithms

@elf-pavlik
Copy link
Contributor

If the @context is removed from the original gist source document, and it is compacted without an input, then a different output is produced as can be seen in this gist.

This happens because source document lacks @context and expansion algorithm dropped most of the keys:
https://json-ld.org/spec/latest/json-ld-api/#algorithm-2

8.2) Set expanded property to the result of using the IRI Expansion algorithm, passing active context, key for value, and true for vocab.
8.3) If expanded property is null or it neither contains a colon (:) nor it is a keyword, drop key by continuing to the next key.

We can see it in expanded form tab in playground (just one statement left).

@christopher-johnson
Copy link
Author

RE: processor performance considerations of embedded @contexts

While HTTP caching will typically assist with large payloads, the client side issue with @context is rather with the cost of many request executions over the wire. The solution, if using the jsonld-java processor, is to use a jarcache, and to recursively nest the embedded @context in a single resource. So, basically the cached context structure looks like this:

{
  "@context": [{"root_context": "value"},{"embedded_context": "value"}]
}

This improves compacted JSON-LD deserialization (i.e. using expansion) performance exponentially.

@azaroth42
Copy link
Contributor

To the original issue, that @context statements are lost when parsing to a graph, that is the expected behavior as contexts are not part of the graph, just the particular serialization.

I propose to close the issue, wontfix. If there are other issues to be extracted from the resulting discussion above, then they should form new issues.

@akuckartz
Copy link

I agree with the most recent suggestion by @azaroth42

(I do not think that this is a real bug. I would close as "can not reproduce".)

@gkellogg gkellogg added the wont fix Issue discussed and closed as Won't Fix label Apr 4, 2018
@gkellogg gkellogg closed this as completed Apr 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api spec-design wont fix Issue discussed and closed as Won't Fix
Projects
None yet
Development

No branches or pull requests

6 participants