-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manifest files need their own MIME Media Type (because canonicalization) #409
Comments
This would create significant problems. The JSON-LD spec requires the If we do this, we cut ourselves from the JSON-LD world. |
Schema.org does not currently define an additional processing model beyond JSON-LD, so they don't need a separate media type.
ActivityStreams introduced the The Web of Things WG is also defining a separate media type for their JSON format because they have a required transformation step to make it valid JSON-LD. In both these cases, there is a foundational JSON format that requires a processing step in addition to or prior to consuming the content as "pure" JSON-LD. However, both specifications state that using their custom media type comes with the requirement of an implicit JSON-LD The Verifiable Claims WG is likely to end up here also if there are processing requirements beyond the foundation of "pure" JSON-LD processing. Media types are what signal how the contents of a response should be processed. Therefore, because the WPUB specification includes a processing the manifest algorithm--which goes beyond simply consuming the JSON as JSON or JSON-LD--then the WPUB manifest needs to have its own media type. |
I believe both that Activity Stream and the WoT examples are different. What we define as 'authored manifest' is a bona fide JSON-LD. It is also a JSON-LD that abides (unless there are no terms, in which case we have to add our own) to schema.org. What I meant by
is that if we use a different media type, that means that our idiom for, e.g., the embedded manifest: <script id="example_manifest" type="application/ld+json">
{
…
}
</script> has to use the new media type instead of The 'canonical manifest' is also a bona fide JSON-LD; furthermore, a canonical manifest can also be used as an authored manifest (i.e., any 'canonical manifest' is a valid 'authored manifest'). The canonical manifest is more of an implementation specification tool, a way of specifying what exactly a WPUB processor must do to correctly interpret the data expressed in the authored manifest. It is not really exposed to the outside world except if the author decides to use a fully expanded canonical manifest in their authored manifest (which is valid). I do not see what a separate media type would bring us. P.S. If the media type spec allowed something like |
Actually, the canonicalization algorithm defines more precisely what, I would think, schema.org processors do behind the scenes, too. There are a number of terms in schema.org, for example, where the value can be a single value or an array of values and the processors use an array at the end of the day. A possibility would be to look at the canonicalization algorithm could be expressed by some clever tricks in contexts, and relying on the output of the JSON-LD expansion and maybe framing algorithms. That may cover most of the canonicalization steps although there may be some features that could not be expressed that way and would therefore be pushed back into the definition of the authored manifest (making it a bit more complex to users). Although we never explored this in all details, such approach was pushed back in the past: the implementation feedback was that reading systems would not incorporate a full-blown JSON-LD processing (which is way more complicated), and it was better to spell out those portion that are relevant for this specification. |
It's true they are both "bona fide JSON-LD." However, the issue is that they are not the same JSON-LD--so you end up with potentially two distinct graph structures...and the canonical one has "injected" statements/triples.
The Google Structured Data Testing Tool (at least) does modify the incoming graph and introduce non-JSON-LD expressed assumptions about what was expressed--i.e. everything becomes The WPUB canonicalization process is indeed the same. It takes a valid JSON-LD input and turns it into different JSON-LD for internal processing (and possible output). So, while I agree that JSON-LD is a valid mime type for either of these "instantiations" of the manifest, I do believe there needs to be some way to signal that additional processing will be (and I believe in some cases MUST be) done before it's an actual WPUB manifest. All of that seems a bit beyond just "profiling" also...
If framing is indeed an option and if it can be somehow expressed along with the media type--and in a way schema.org processors won't choke or ignore it, then we may have found a solution. 😃
An existing processing system is far less complicated than one that hasn't yet been written. 😁 So if a processing/altering/canonicalizing step is still required, we're in the same "require processing" boat...but now have a new set of processing that must take place (i.e. canonicalization) before that data can be considered fully valid/hydrated. So...again...we end up with two manifests...which don't (typically) match. Lastly, your point about Schema.org and SEO is valid. However, since the manifest may not even be in the page, the SEO merits of it are suspect (at least if not embedded)--see also #327 (comment) Underlying concerns include:
So, the summary then is that if the canonicalization process is required to properly interpret a manifest (i.e. the graph/data model output would be incorrect or insufficient), then authored manifests (at least) MUST have their own media type because they require additional, non-JSON-LD related processing. Consequently, canonicalization begins to sound like something done by a tool pre-publication and not at consumption/run-time. |
@BigBlueHat I would prefer we try to find some time in Cambridge (at the F2F) to discuss this, I think it would be more fruitful (it is a complex issue). @TzviyaSiegman @wareid @GarthConboy: can we do that? Only one comment on your remark:
Note that there is a resolution of #327, which is reflected, in section 3.3.3 of the current draft:
You are right that this is not a MUST but almost. Which means that the SEO issue must be taken extremely seriously. |
An area far from my expertise, but a slot on the F2F agenda SGTM... maybe I'll get smart. |
This issue was discussed in a meeting.
View the transcriptManifest files need their own MIME Media TypeWendy Reid: #409 : be discussed today or tomorrow. Benjamin Young: https://http.cat/409 Benjamin Young: Create a mime type for manifest files … have operational set of actions … convert from authored manifest to canonical manifest … user needs … beyond json.parse … beyond graph representation … 2 expressed formats … operationally different … so if people implement canincialization process … we need a new media type … wpub + json or some such … as activity streams people did … beyond json-ld … needed their own media type … we should do the same for both authored and canonica Ivan Herman: This is the issue about which we say “specification purity less important than good of community” … the authored manifest; if not using LD + JSON media type … then will be ignored by schema.org processors … killing its raison d’etre … should not touch MT … could add profile … for whatever reason … we could decide to give a differnt MT to canonical manifest … but CM can be used as AM … same formate … same data … so should not be different MT … strinctly speaking CM and AM have different RDF representations … but that is specification purity … backfire on practicality … schema.org processors say something is a URI or stream … we accept the lack of purity … we should not touch Benjamin Young: profile = does not solve the issue … for schema.org; it is ignored … JSON - LD.js going into Chrome lighthouse … so they use json-ld going forward … if we don’t go through some process … they are equivalent in doc; but not really … so pub has different states of meaning … authored v consumed … If Wiley takes Moby Dick as authored get one result … through canonicalization has different meaning … could do what schema.org does … but how does an implementor know? Tzviya Siegman: Is there a way to end the stalemate? Ivan Herman: Say the authored manifest must use schema.org creative work … or a subtype thereof … could define a separate type and demand that it is added to the manifest … we signal it is not just a creative work … also a web pub … needs canonicalization to get web pub features … the type is an array of types … AB, VBs … this works and answers concerns Laurent Le Meur: I fear it would be an abuse of the mechanism of context types in schema.org … used to indicate properties within a structure Ivan Herman: It’s an RDF type… no more Tim Cole: schema.org defines additional type property … can be used for this … make sure schema.org understands … could do an extension … as long as not primary Ivan Herman: A subtype of creative work? Tim: An extension … creative type by inheritance … external vocabulary Ivan Herman: It is a schema.org syntactic hack Benjamin Young: This came from canonicalization … not to express more … VC has a processing model … an intended use for data models … equivalent to using json - ld parser … but we have two types: AM and CM … a consumer does not know what you have … you are left wondering … it may be a question of who runs canconicalization … publisher does not want messy author thing … we want a canonicalized thing … developer won’t know … will consume messy thing wrong Ivan Herman: Your solution works in an ideal world … too high for publishers … need to lower the bar … (except Wiley) … there are self-publishers, etc. … we want a simple manifest … requiring caninical manifest not realistic Wendy Reid: I don’t hear the conclusion … can the paricipants work it out? Benjamin Young: Developers can be smaller than Wiley … but the technology does not say when to use CM … signal what processing to do … today; nothing that distinguishes … no clarity about process … different from structured data testing tool … need to signal when to execute George Kerscher: Does a wpub check resolve this problem? Benjamin Young: “The tools will save us” Matt Garrish: You always run the canonicalization … but maybe nothing to do to AM if everything is already there … don’t bypass Ivan Herman: Can clarify doc to say “when reading system turns AM into abstracted web idl, in that process it canonicalizes the manifest and converts to JS classes” … if AM is complete, then canonicalization is the empty set Benjamin Young: The way you know to run that is linkrel … SEO bots will get something different … the AM output … which will differ and may not be found Romain Deltour: To George’s question … epub checking very different … on web, content is not validated … don’t require valid content … so future web pub checker cannot be used this way … just a lint … user agents won’t request content George Kerscher: You can require consistency Romain Deltour: But you can have content fail; OK for the web Wendy Reid: Do not see consensus … need working the issue + referee Benjamin Young: Ivan and Matt have pointed out that canonicalization only targets wpub processors … they are looking for rel relationship and and abstracting it … so we are ok … seo bots and post processors will be confused, but that’s ok … we can close the issue … do not need a media type Wendy Reid: Can you formalize that proposal … (Issue #44 is for tomorrow) Proposed resolution: the rel=”publication” discovery mechanism will be what signals the need for canonicalization/processing (Benjamin Young) Ivan Herman: +1 Wendy Reid: +1 Tim Cole: +1 Matt Garrish: +1 Marisa DeMeglio: +1 David Stroup: +1 Tzviya Siegman: +1 Deborah Kaplan: +1 Benjamin Young: +1 Nellie McKesson: +1 Dave Cramer: just make it stop! Rachel Comerford: +1 Romain Deltour: +1 Resolution #1: the rel=”publication” discovery mechanism will be what signals the need for canonicalization/processing Wendy Reid: So resolved |
There is additional processing (beyond JSON, beyond JSON-LD) required for generating a canonical manifest. Consequently, a new MIME media type is needed to signal that this unique processing is required.
Additionally, the processing steps required for canonicalization alter the meaning of the original JSON-LD. The result is a different graph containing new statements from those in the authored manifest, so a simple JSON-LD "profile" would be insufficient (as profiles can't alter the encoded document).
So, something like
application/wpub+json
would be best as it signals the foundational format (JSON) as well as the short name of the W3C spec which describes the required processing steps.The WPUB spec could also declare the default
@context
value for the media type as done by ActivityStreams in their IANA submission.This also has the benefit of opening the door to profiling the manifest format more clearly via the
profile
parameter with a value pointing to an extended context or new definition document.The text was updated successfully, but these errors were encountered: