Content-Type MUST be used on all PUT/POST/PATCH requests? #70

kjetilk · 2019-09-25T14:03:03Z

5.2.3.6 LDP servers SHOULD use the Content-Type request header to determine the request representation's format when the request has an entity body.

I'm not sure how the SHOULD has been justified there, since I struggle to see any other valid way, so it seems we should have a MUST there.

pmcb55 · 2019-09-26T10:15:42Z

Couldn't the server use content sniffing to 'guess' the content type? I'm not sure if that counts as a 'valid way', but it seems to have been common enough in the past I believe...

RubenVerborgh · 2019-09-26T10:33:35Z

Couldn't the server use content sniffing to 'guess' the content type?

Security issue; let's not do that. Agree on MUST.

csarven · 2019-09-26T10:39:17Z

[Saw this on my phone.. literally rushed to get my laptop to respond.. Ruben beat me to it by a second.]

https://tools.ietf.org/html/rfc7231#section-3.1.1.5 is fairly clear about potential issues around content sniffing.

csarven · 2019-09-26T10:41:54Z

MUST raises the bar - so to speak from, SHOULD - but it will remove potential issues and any ambiguity. That's a quality on its own and what to expect from clients. So, +1 to MUST.

pmcb55 · 2019-09-26T11:07:26Z

Yeah, +1 to MUST from me too (I was only trying to point out that content sniffing was perhaps another 'valid way', I certainly wasn't trying to suggest that we ever rely on it, absolutely not :) !)

csarven · 2019-09-26T12:42:59Z

For the sake of documentation and to offer an answer for why SHOULD and to eventually resolve this issue:

The server may not want to completely rely on what a client claims in the request. It gives a way out I suppose to do some verification. For authenticated agents, this would be less of a concern.

acoburn · 2019-09-26T12:54:52Z

The SHOULD-level requirement in LDP is used in the context of clients sending POST requests -- requests with an entity body -- to a server. If this is increased to MUST, what are the implications if a client does not sent a Content-Type header?

For example, if a client attempts to create an LDP-NR and does not send a Content-Type, does that mean that the request must be rejected? If the request is accepted, does that mean that subsequent responses for that resource cannot use, for example, Content-Type: application/octet-stream?

kjetilk · 2019-09-26T13:22:34Z

For example, if a client attempts to create an LDP-NR and does not send a Content-Type, does that mean that the request must be rejected?

Yes, that has been my understanding. There is basically no legitimate way to tell what media type there is without the header, so a 400 response results. We should have structured error message bodies for those cases though.

pmcb55 · 2019-09-26T21:45:01Z

So @csarven - could you elaborate a little on what you mean below: (as I don't understand what you mean at all):

The server may not want to completely rely on what a client claims in the request. It gives a way out I suppose to do some verification.

And @acoburn, when you say:

The SHOULD-level requirement in LDP is used in the context of clients sending POST requests -- requests with an entity body -- to a server.

...I think that's clear from @kjetilk's original comment that created this issue. The question is, does anyone know a good justification for the LDP guys deciding on the Content-Type header being a SHOULD here instead of a MUST, right?

'Cos I agree with @kjetilk that 'the implications if a client does not send a Content-Type header' should be simply a 400 response. The only other option I see for processing an incoming request without a Content-Type header is content sniffing, which everyone (so far) agrees would be a really, really bad idea.

So again, are we missing something here that the LDP spec editors realised, and that justified their making the Content-Type header a SHOULD instead of a MUST?

csarven · 2019-09-27T09:59:18Z

If this is increased to MUST, what are the implications if a client does not sent a Content-Type header?

Noting here that changing requirements around Content-Type will need to be factored in TSE (The Solid Ecosystem), especially the HTTP section.

csarven · 2019-09-27T10:00:41Z

So @csarven - could you elaborate a little on what you mean below: (as I don't understand what you mean at all):

The server may not want to completely rely on what a client claims in the request. It gives a way out I suppose to do some verification.

(I'm only speculating a possible reason because I don't have a citation.)

We can consider the public-write or append case where a client may end up using an incorrect Content-Type value - not matching the payload. Noting here that if a server wants to ensure that a resource is eventually served accurately, it may end up content-sniffing (whether Content-Type has a value and valid for the payload, or not). This is neither encouraged or part of the agreement for Content-Type use.

"[..] does anyone know a good justification [..]"

@TallTed @bblfish may want to chime in.

bblfish · 2019-09-27T10:09:58Z

Protocols have to say what must happen for things to work correctly. One can work one's way around bad clients, but then one is taking risks, which inevitably will be taken advantage of at some point by folks trying to break security.

I believe this is related to deontic logics, and at some point I'll see if I can find a good explanation of this in those terms. Until then I'd go with the above recipe.

kjetilk · 2019-09-27T11:35:23Z

We can consider the public-write or append case where a client may end up using an incorrect Content-Type value - not matching the payload.

Yup, but that's where we slap the $ from https://www.w3.org/DesignIssues/HTTPFilenameMapping.html on it, I think.

csarven · 2019-09-27T11:50:35Z

slap the $

Implementation detail... useful if there is a Content-Type.

RubenVerborgh · 2019-09-27T13:38:48Z

Implementation detail..

Not for all backends.

@timbl has indicated that he wants https://www.w3.org/DesignIssues/HTTPFilenameMapping.html to evolve into a specification that makes Solid also interoperable on a filesystem level.

csarven · 2019-09-27T13:58:39Z

I was saying that in context of TSE. As for it being part of "file-based-solid-spec", okay. Perhaps I shouldn't have used "..." because the "if" is important. I'm not sure how HTTPFilenameMapping is expected to work for the no Content-Type case, unless of course there is some fallback to .ttl and/or check Link header for rel RDFSource or something (but still wouldn't know if the payload is actually Turtle or something else). And, that brings things back to non-turtle-rdf.ttl being conneg'd for Turtle.. So, I think Content-Type is quite critical (MUST) for file-based-solid-spec.

TallTed · 2019-09-27T20:00:25Z

Forgive me for not tracking down all citations; I'm on a wet-string connection at the moment.

As noted above, the cited quote of 5.2.3.6 from LDP is within the 5.2.3 HTTP POST section (so, not a global LDP rule).

Note also, from 6.2 HTTP 1.1 --

6.2.6 When the Content-Type request header is absent from a request, LDP servers might infer the content type by inspecting the entity body contents ([RFC7231] section 3.1.1.5).

I believe that the RFCs and W3 specs upon which LDP was built -- and which inheritance should be persisted in Solid -- state that POST (and PUT) without a Content-Type at least MAY (and my personal feeling is SHOULD) be accepted and treated as if submitted with Content-Type: application/octet-stream. I don't understand the apparently perceived issue with so doing.

The server doesn't parse, transform, or otherwise manipulate the payload of such submissions. Turtle content is left as Turtle; it is not parsed and loaded to a back-end RDF store. Clients requesting Content-Type: text/turtle from that resource don't get that, even though the content of the file is structured as Turtle. Clients requesting Content-Type: text/ld+json (or whatever that MIME type is) don't get a JSON-LD transformation of the Turtle. Clients including */* or application/octet-stream in their GET request Accept: (or similar) list will get the raw resource, with Content-Type: application/octet-stream, and what they choose to do with that is their own lookout.

What am I missing?

acoburn · 2019-09-27T20:12:27Z

FWIW, the Trellis server does, effectively, what @TallTed describes:

A POST with an LDP-NR link header, an entity body and no content-type header will store the resource with Content-Type: application/octet-stream. Subsequent requests for that LDP-NR returns a resource with a Content-Type: application/octet-stream response header.

A POST with no LDP link header, an entity body and no content-type header is accepted as an LDP-NR and will respond with a Content-Type: application/octet-stream header.

A POST with an LDP-RS link header, an entity body and no content-type header is rejected immediately.

Nowhere in that process is there any content sniffing. LDP-NRs are always treated (internally) as opaque byte arrays, and LDP-RSs must be parseable (if an entity body is present), based on the provided content-type.

csarven · 2019-09-28T09:36:06Z

@TallTed Thanks for the feedback and giving the thread a bit more juice :)

I don't understand the apparently perceived issue with so doing.

AFAICT, the perceived issue wasn't particularly about application/octet-stream but given that it is a MAY and the alternative option that's mentioned in https://tools.ietf.org/html/rfc7231#section-3.1.1.5 :

or examine the data to determine its type

and the paragraph that discusses misconfigurations and potential risks.

What am I missing?

Nothing. The discussion just went in the direction to eliminate ambiguity by forcing clients to always indicate their intentions.

I now see that my comment in #70 (comment) wasn't adequate either because it was only in context of eliminating ambiguity as much as possible based on earlier discussion (re: "content sniffing"). So with that as a premise, MAY for application/octet-stream probably didn't matter. We need to revisit.

csarven · 2019-09-28T09:36:35Z

@acoburn thanks for sharing!

The first two cases leading to LDP-NR/application/octet-stream is already covered by LDP/RFC and so leave as is (as @TallTed and you already indicated).

As for:

A POST with an LDP-RS link header, an entity body and no content-type header is rejected immediately.

Is the reasoning based on https://www.w3.org/TR/ldp/#ldpc-post-createrdf :

If any requested interaction model cannot be honored, the server MUST fail the request.

Does Trellis consider Content-Type as a requirement to check if the interaction model can be honoured? Or is this separate?

We can acknowledge that RDF payload can make its way into a server but can only be reused via */* or application/octet-stream. Perhaps more specifically, I'd like to understand what are the use cases for @TallTed's remark about clients:

what they choose to do with that is their own lookout.

Put differently, why should RDF payload without indicating an interaction model through Link header or Content-Type be accepted whereas if the interaction model is specified (LDP-RS in Link header) gets a reject:

A POST with an LDP-RS link header, an entity body and no content-type header is rejected immediately.

What am I missing?

acoburn · 2019-09-28T11:51:50Z

A POST with an LDP-RS link header, an entity body and no content-type header is rejected immediately.

Actually I was incorrect about this. A POST with an LDP-RS link header and no content type is accepted as text/turtle.

Within Trellis, LDP-NRs and LDP-RSs follow entirely different code paths and are stored in different subsystems. LDP-NRs are just opaque byte streams while LDP-RSs need to be parsed and validated before being persisted: hence, there is a need to know the content type of the HTTP entity body.

Thus far, I have treated these "what to do if the client isn't completely explicit" heuristics as implementation decisions, but if the Solid specification chooses to weigh in on or clarify these behaviors, it would be pretty simple to make adjustments, if adjustments need to be made.

RDF payload without indicating an interaction model through Link header or Content-Type be accepted

In this case, the reason the request is accepted has to do with the fact that it is accepted as an LDP-NR. This also raises a slightly different issue: if a client does not supply an LDP interaction model, what interaction model (if the request is accepted) should be assigned to the resource? Here, Trellis makes a decision based on the Content-Type header, choosing either an LDP-NR or an LDP-RS (if there is no Content-Type header, as stated above, an LDP-NR is chosen).

TallTed · 2019-09-30T15:13:14Z

@acoburn --

Actually I was incorrect about this. A POST with an LDP-RS link header and no content type is accepted [by Trellis] as text/turtle.

That doesn't seem quite the right action. An LDP-RS link header does give you a strong hint of the client's intent, but it would be equally valid attached to a JSON-LD payload as a Turtle payload -- and indeed, a payload in any other RDF serialization (though only JSON-LD and Turtle are required to be supported by LDP servers). Minimally, it seems that Trellis should test whether the payload is Turtle or JSON-LD (or neither), and take appropriate next steps...

acoburn · 2019-09-30T15:25:51Z

@TallTed the reasoning behind assuming that an LDP-RS is Turtle (rather than any other format) stems from an inversion of this statement:

4.3.2.1 LDP servers must respond with a Turtle representation of the requested LDP-RS when the request includes an Accept header specifying text/turtle, unless HTTP content negotiation requires a different outcome.

In other words, treat representations as Turtle unless there is a reason to think differently. I would also be very cautious about any sort of content sniffing, as has already been pointed out above.

TallTed · 2019-09-30T18:14:50Z

@acoburn - Speaking as a participant in the LDP WG that produced that spec, I don't think any of us expected any of its statements to be inverted in such a way.

Further, I'm not understanding how telling Servers "you must deliver Turtle representation of LDP-RS when Turtle is requested (i.e., when request includes Accept: text/turtle), unless conneg indicates another representation of the LDP-RS is preferred" turns into telling Servers "every LDP-RS submitted by a Client must be (treated as) Turtle unless you're told otherwise by the Client" into nor telling Clients "every LDP-RS you submit must be (or will be treated as) Turtle unless you tell the Server otherwise" especially given that we explicitly required Servers to accept both JSON-LD (5.2.3.14) and TTL (5.2.3.5) if they accepted POST at all (5.2.3).

Still further, we didn't recommend content sniffing, though we said, in a non-normative section, that "When the Content-Type request header is absent from a request, LDP servers might infer the content type by inspecting the entity body contents ([RFC7231] section 3.1.1.5)." Note also that RFC7231 says in the third paragraph of 3.1.1.5 "If a Content-Type header field is not present, the recipient MAY either assume a media type of "application/octet-stream" ([RFC2046], Section 4.5.1) or examine the data to determine its type."

In other words -- the expectation is that you would either examine the data, or assume that it is application/octet-stream -- not that you would assume that it is text/turtle.

In sum -- it is dangerous to pluck any single statement from any spec, and more so to use your own interpretation of the inverse of such statement.

csarven · 2019-09-30T19:34:45Z

A POST with an LDP-RS link header and no content type is accepted as text/turtle.

If the payload is in JSON-LD, what does Trellis do as it accepts the request? What happens when the created resource is requested i) without an Accept header, ii) with Accept: application/ld+json ?

acoburn · 2019-09-30T23:20:38Z

@TallTed my position on this is that the LDP specification is silent on this matter. As a consequence, a server's behavior is an implementation decision. If another specification were to define what ought to be done in this case, I would be happy to follow that, and I believe that is the basic purpose of this issue. I really have no opinion on what ought to be done in this case.

@csarven if the POST payload is JSON-LD, then the client-submitted request is parsed as JSON-LD (provided that the request includes a Content-Type header). Provided that the request succeeds, then subsequent GET requests (i) without an Accept header would return text/turtle while (ii) with an Accept: application/ld+json would return application/ld+json.

csarven · 2019-09-30T23:34:09Z

@acoburn

(provided that the request includes a Content-Type header)

is a different case. I'm trying to understand this better:

A POST with an LDP-RS link header and no content type is accepted as text/turtle.

If the request has an LDP-RS Link header and message body in JSON-LD, what does "accepted as text/turtle" entail? Can you expand on that process?

acoburn · 2019-09-30T23:57:52Z

Internally, for LDP-RSs, a create/update request involves a parsing stage before the data is persisted. That is, the byte stream (java.io.InputStream) that is an incoming RDF serialization (in this case, a JSON-LD document) is translated into Java objects, specifically a org.apache.commons.rdf.api.Dataset with some additional metadata.

That parsing stage must succeed before the RDF is persisted into the storage layer. In fact, between parsing and storage, there is also a validation step. For instance, LDP imposes certain rules, and if those rules are violated, the request will be rejected. (Shape validation of various sorts happens at this point)

But as for the parsing stage, the parser needs to be told exactly how to parse the InputStream: is it text/turtle or application/n-triples or application/ld+json? Is it an RDF serialization that is not supported (e.g. application/rdf+xml or application/trig)? If that parsing stage fails, the request fails.

So, when I write:

accepted as text/turtle

I mean that, in the absence of an explicit Content-Type from the client, the RDF parser uses a Turtle-based reader. So in this case, if a Turtle-based reader is used (because the client didn't supply a Content-Type) but the HTTP entity body is actually something else (e.g. JSON-LD), then the request will simply fail with a parsing error. A log entry will be generated and the server will respond with a 400 Bad Request.

acoburn · 2019-10-01T00:20:51Z

One thing to add as more of a pragmatic point is that, by having the RDF parser default to Turtle, that means that an empty InputStream can be treated as valid RDF, since an empty file is also a valid Turtle document. So a request with no body can be treated as a valid incoming RDF document.

pmcb55 · 2019-10-01T07:18:53Z

Great discussion, but I remain a MUST-er (i.e. Content-Type MUST be used). That means a Solid server receiving a POST with an LDP-RS link header and no content type is rejected immediately (intuitively, I just don't like the implicit assumption of Turtle, as it feels (as with all implicit assumptions) 'dangerous' somehow).

@acoburn's point on being able to handle empty streams is interesting though, as it does seem a little strange to MUST provide a Content-Type if I know I'm passing an empty body. But I still think just a non-normative note to suggest clients provide Content-Type: text/turtle in that (presumably edge-) case is fine too...

So @TallTed, given your experience on the LDP spec, do you personally think a MUST is justified here (i.e. in the Solid spec), or would we be losing something if we mandate that (apart from the obvious 'burden' it places on the client to be explicit in what it wants to actually do!)?

kjetilk · 2019-10-07T21:11:18Z

Indeed... I mean, we should always strive for requiring as little as possible, and I have been advocating for that just a little above any Web server would be a good idea, but it seems to me that the authn sets requirements in both ends.

dmitrizagidulin · 2019-10-07T21:23:03Z

@TallTed Ah, got it. Yeah, definitely needs an auth client. (And I don't think that's against the spirit of the LDP spec, since, ahem, the LDP spec left authentication as out of scope :) )

kjetilk · 2019-10-30T14:12:45Z

Proposal following F2F meeting with @csarven , @timbl and @kjetilk present of 2019-10-30:

We found it is advantageous to avoid lack of clarity in content types, since a fallback to defaults like application/octet-stream would result in that apps cannot determine the content type, and therefore not present a suitable UX for it.

Users of basic UAs (e.g. curl) should be prevented from skipping the content type, because that may cause subsequent problems for apps using the data.

The cost of requiring clients to submit content type is thus much lower than cost of the requirement on servers and clients to deal with the consequences of wrong or useless content types.

This points towards a strict interpretation, i.e. MUST.

TallTed · 2019-10-30T14:37:30Z

Does this mean that the uploads must include content types that are known to the solid server? Or just ANY content type (which might well include application/octet-stream)?

How does the server test whether the content type a client specifies for a resource is appropriate to its content? If the server does not test such, how does this requirement prevent "wrong or useless content types"?

I do not see how simply forcing a content type to be present achieves the stated goals -- i.e., guaranteeing a suitable UX for that content, or that apps will know how to work with such content types.

kjetilk · 2019-10-30T15:08:37Z

Does this mean that the uploads must include content types that are known to the solid server?

No.

It can't prevent a bad client from adding bad data, if the client has no idea about the content-type, it will fall back to application/octet-stream, but then, the client needs to understand that other clients will not be able to use it for anything. I didn't write "guarantee", I wrote "advantageous" :-)

It is more about making the chance of breakage smaller, not prevent breakage entirely.

TallTed · 2019-10-30T15:53:13Z

"Other clients will not be able to use it for anything" overstates the impact. "Other clients" may inspect the payload of .ttl typed as application/octet-stream, discover that it's actually JSON-LD, and take appropriate action.

"Making the chance of breakage smaller" sounds to me much more like a SHOULD rule, than a MUST rule, as the latter will tend to lead to expectations that will not necessarily be satisfied.

This decision is not mine in the end, but I hope that the documentation around it will be clear about actual effect, as opposed to wishful effect.

Mitzi-Laszlo assigned kjetilk and unassigned csarven Oct 11, 2019

csarven mentioned this issue Oct 21, 2019

POST with empty body fails nodeSolidServer/node-solid-server#1316

Closed

RubenVerborgh changed the title ~~Content-Type MUST be used?~~ Content-Type MUST be used on all PUT/POST/PATCH requests? Oct 29, 2019

RubenVerborgh added status: Ready for Decision doc: Protocol labels Oct 29, 2019

RubenVerborgh unassigned kjetilk Oct 29, 2019

kjetilk removed the status: Ready for Decision label Oct 30, 2019

kjetilk mentioned this issue Nov 20, 2019

creation of file without extension (foo/file) returns foo/file should be foo/file$.bin nodeSolidServer/node-solid-server#1364

Closed

kjetilk mentioned this issue Dec 16, 2019

Clarify the heuristics to determine the interaction model if none is specified #128

Closed

Mitzi-Laszlo modified the milestones: December 19th, February 19th Jan 14, 2020

csarven modified the milestones: February 19th, ~First Public Working Draft Jan 24, 2020

csarven closed this as completed Aug 3, 2020

csarven mentioned this issue Nov 25, 2020

Content-Type - insisting on it as a MUST will create a barrier to adoption #211

Closed

csarven moved this to Done in Specification Sep 25, 2022

csarven added this to Specification Sep 25, 2022

csarven mentioned this issue May 27, 2024

Clarify requests including content; container creation; omit slug targetting auxiliary #660

Merged

Content-Type MUST be used on all PUT/POST/PATCH requests? #70

Content-Type MUST be used on all PUT/POST/PATCH requests? #70

Comments

kjetilk commented Sep 25, 2019

pmcb55 commented Sep 26, 2019

RubenVerborgh commented Sep 26, 2019 • edited Loading

csarven commented Sep 26, 2019

csarven commented Sep 26, 2019

pmcb55 commented Sep 26, 2019

csarven commented Sep 26, 2019

acoburn commented Sep 26, 2019

kjetilk commented Sep 26, 2019

pmcb55 commented Sep 26, 2019

csarven commented Sep 27, 2019

csarven commented Sep 27, 2019

bblfish commented Sep 27, 2019

kjetilk commented Sep 27, 2019

csarven commented Sep 27, 2019

RubenVerborgh commented Sep 27, 2019

csarven commented Sep 27, 2019

TallTed commented Sep 27, 2019

acoburn commented Sep 27, 2019

csarven commented Sep 28, 2019

csarven commented Sep 28, 2019

acoburn commented Sep 28, 2019

TallTed commented Sep 30, 2019

acoburn commented Sep 30, 2019

TallTed commented Sep 30, 2019 • edited Loading

csarven commented Sep 30, 2019

acoburn commented Sep 30, 2019

csarven commented Sep 30, 2019 • edited Loading

acoburn commented Sep 30, 2019

acoburn commented Oct 1, 2019

pmcb55 commented Oct 1, 2019 • edited Loading

kjetilk commented Oct 7, 2019

dmitrizagidulin commented Oct 7, 2019

kjetilk commented Oct 30, 2019

TallTed commented Oct 30, 2019

kjetilk commented Oct 30, 2019

TallTed commented Oct 30, 2019

RubenVerborgh commented Sep 26, 2019 •

edited

Loading

TallTed commented Sep 30, 2019 •

edited

Loading

csarven commented Sep 30, 2019 •

edited

Loading

pmcb55 commented Oct 1, 2019 •

edited

Loading