-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content-Type MUST be used on all PUT/POST/PATCH requests? #70
Comments
Couldn't the server use content sniffing to 'guess' the content type? I'm not sure if that counts as a 'valid way', but it seems to have been common enough in the past I believe... |
Security issue; let's not do that. Agree on MUST. |
[Saw this on my phone.. literally rushed to get my laptop to respond.. Ruben beat me to it by a second.] https://tools.ietf.org/html/rfc7231#section-3.1.1.5 is fairly clear about potential issues around content sniffing. |
MUST raises the bar - so to speak from, SHOULD - but it will remove potential issues and any ambiguity. That's a quality on its own and what to expect from clients. So, +1 to MUST. |
Yeah, +1 to MUST from me too (I was only trying to point out that content sniffing was perhaps another 'valid way', I certainly wasn't trying to suggest that we ever rely on it, absolutely not :) !) |
For the sake of documentation and to offer an answer for why SHOULD and to eventually resolve this issue: The server may not want to completely rely on what a client claims in the request. It gives a way out I suppose to do some verification. For authenticated agents, this would be less of a concern. |
The For example, if a client attempts to create an LDP-NR and does not send a |
Yes, that has been my understanding. There is basically no legitimate way to tell what media type there is without the header, so a |
So @csarven - could you elaborate a little on what you mean below: (as I don't understand what you mean at all):
And @acoburn, when you say:
...I think that's clear from @kjetilk's original comment that created this issue. The question is, does anyone know a good justification for the LDP guys deciding on the Content-Type header being a SHOULD here instead of a MUST, right? 'Cos I agree with @kjetilk that 'the implications if a client does not send a Content-Type header' should be simply a So again, are we missing something here that the LDP spec editors realised, and that justified their making the Content-Type header a SHOULD instead of a MUST? |
Noting here that changing requirements around Content-Type will need to be factored in TSE (The Solid Ecosystem), especially the HTTP section. |
(I'm only speculating a possible reason because I don't have a citation.) We can consider the public-write or append case where a client may end up using an incorrect Content-Type value - not matching the payload. Noting here that if a server wants to ensure that a resource is eventually served accurately, it may end up content-sniffing (whether Content-Type has a value and valid for the payload, or not). This is neither encouraged or part of the agreement for Content-Type use.
|
Protocols have to say what must happen for things to work correctly. One can work one's way around bad clients, but then one is taking risks, which inevitably will be taken advantage of at some point by folks trying to break security. I believe this is related to deontic logics, and at some point I'll see if I can find a good explanation of this in those terms. Until then I'd go with the above recipe. |
Yup, but that's where we slap the |
Implementation detail... useful if there is a Content-Type. |
Not for all backends. @timbl has indicated that he wants https://www.w3.org/DesignIssues/HTTPFilenameMapping.html to evolve into a specification that makes Solid also interoperable on a filesystem level. |
I was saying that in context of TSE. As for it being part of "file-based-solid-spec", okay. Perhaps I shouldn't have used "..." because the "if" is important. I'm not sure how HTTPFilenameMapping is expected to work for the no Content-Type case, unless of course there is some fallback to .ttl and/or check Link header for rel RDFSource or something (but still wouldn't know if the payload is actually Turtle or something else). And, that brings things back to |
Forgive me for not tracking down all citations; I'm on a wet-string connection at the moment. As noted above, the cited quote of 5.2.3.6 from LDP is within the 5.2.3 HTTP POST section (so, not a global LDP rule). Note also, from 6.2 HTTP 1.1 --
I believe that the RFCs and W3 specs upon which LDP was built -- and which inheritance should be persisted in Solid -- state that The server doesn't parse, transform, or otherwise manipulate the payload of such submissions. Turtle content is left as Turtle; it is not parsed and loaded to a back-end RDF store. Clients requesting What am I missing? |
FWIW, the Trellis server does, effectively, what @TallTed describes: A POST with an LDP-NR link header, an entity body and no content-type header will store the resource with A POST with no LDP link header, an entity body and no content-type header is accepted as an LDP-NR and will respond with a A POST with an LDP-RS link header, an entity body and no content-type header is rejected immediately. Nowhere in that process is there any content sniffing. LDP-NRs are always treated (internally) as opaque byte arrays, and LDP-RSs must be parseable (if an entity body is present), based on the provided content-type. |
@TallTed Thanks for the feedback and giving the thread a bit more juice :)
AFAICT, the perceived issue wasn't particularly about
and the paragraph that discusses misconfigurations and potential risks.
Nothing. The discussion just went in the direction to eliminate ambiguity by forcing clients to always indicate their intentions. I now see that my comment in #70 (comment) wasn't adequate either because it was only in context of eliminating ambiguity as much as possible based on earlier discussion (re: "content sniffing"). So with that as a premise, MAY for |
@acoburn thanks for sharing! The first two cases leading to LDP-NR/application/octet-stream is already covered by LDP/RFC and so leave as is (as @TallTed and you already indicated). As for:
Is the reasoning based on https://www.w3.org/TR/ldp/#ldpc-post-createrdf :
Does Trellis consider Content-Type as a requirement to check if the interaction model can be honoured? Or is this separate? We can acknowledge that RDF payload can make its way into a server but can only be reused via
Put differently, why should RDF payload without indicating an interaction model through Link header or Content-Type be accepted whereas if the interaction model is specified (LDP-RS in Link header) gets a reject:
What am I missing? |
Actually I was incorrect about this. A POST with an LDP-RS link header and no content type is accepted as text/turtle. Within Trellis, LDP-NRs and LDP-RSs follow entirely different code paths and are stored in different subsystems. LDP-NRs are just opaque byte streams while LDP-RSs need to be parsed and validated before being persisted: hence, there is a need to know the content type of the HTTP entity body. Thus far, I have treated these "what to do if the client isn't completely explicit" heuristics as implementation decisions, but if the Solid specification chooses to weigh in on or clarify these behaviors, it would be pretty simple to make adjustments, if adjustments need to be made.
In this case, the reason the request is accepted has to do with the fact that it is accepted as an LDP-NR. This also raises a slightly different issue: if a client does not supply an LDP interaction model, what interaction model (if the request is accepted) should be assigned to the resource? Here, Trellis makes a decision based on the Content-Type header, choosing either an LDP-NR or an LDP-RS (if there is no Content-Type header, as stated above, an LDP-NR is chosen). |
@acoburn --
That doesn't seem quite the right action. An LDP-RS link header does give you a strong hint of the client's intent, but it would be equally valid attached to a JSON-LD payload as a Turtle payload -- and indeed, a payload in any other RDF serialization (though only JSON-LD and Turtle are required to be supported by LDP servers). Minimally, it seems that Trellis should test whether the payload is Turtle or JSON-LD (or neither), and take appropriate next steps... |
@TallTed the reasoning behind assuming that an LDP-RS is Turtle (rather than any other format) stems from an inversion of this statement:
In other words, treat representations as Turtle unless there is a reason to think differently. I would also be very cautious about any sort of content sniffing, as has already been pointed out above. |
@acoburn - Speaking as a participant in the LDP WG that produced that spec, I don't think any of us expected any of its statements to be inverted in such a way. Further, I'm not understanding how telling Servers "you must deliver Turtle representation of LDP-RS when Turtle is requested (i.e., when request includes Still further, we didn't recommend content sniffing, though we said, in a non-normative section, that "When the Content-Type request header is absent from a request, LDP servers might infer the content type by inspecting the entity body contents ([RFC7231] section 3.1.1.5)." Note also that RFC7231 says in the third paragraph of 3.1.1.5 "If a Content-Type header field is not present, the recipient MAY either assume a media type of "application/octet-stream" ([RFC2046], Section 4.5.1) or examine the data to determine its type." In other words -- the expectation is that you would either examine the data, or assume that it is In sum -- it is dangerous to pluck any single statement from any spec, and more so to use your own interpretation of the inverse of such statement. |
If the payload is in JSON-LD, what does Trellis do as it accepts the request? What happens when the created resource is requested i) without an |
@TallTed my position on this is that the LDP specification is silent on this matter. As a consequence, a server's behavior is an implementation decision. If another specification were to define what ought to be done in this case, I would be happy to follow that, and I believe that is the basic purpose of this issue. I really have no opinion on what ought to be done in this case. @csarven if the |
is a different case. I'm trying to understand this better:
If the request has an LDP-RS Link header and message body in JSON-LD, what does "accepted as |
Internally, for LDP-RSs, a create/update request involves a parsing stage before the data is persisted. That is, the byte stream ( That parsing stage must succeed before the RDF is persisted into the storage layer. In fact, between parsing and storage, there is also a validation step. For instance, LDP imposes certain rules, and if those rules are violated, the request will be rejected. (Shape validation of various sorts happens at this point) But as for the parsing stage, the parser needs to be told exactly how to parse the So, when I write:
I mean that, in the absence of an explicit |
One thing to add as more of a pragmatic point is that, by having the RDF parser default to Turtle, that means that an empty |
Great discussion, but I remain a MUST-er (i.e. Content-Type MUST be used). That means a Solid server receiving a POST with an LDP-RS link header and no content type is rejected immediately (intuitively, I just don't like the implicit assumption of Turtle, as it feels (as with all implicit assumptions) 'dangerous' somehow). @acoburn's point on being able to handle empty streams is interesting though, as it does seem a little strange to MUST provide a Content-Type if I know I'm passing an empty body. But I still think just a non-normative note to suggest clients provide So @TallTed, given your experience on the LDP spec, do you personally think a MUST is justified here (i.e. in the Solid spec), or would we be losing something if we mandate that (apart from the obvious 'burden' it places on the client to be explicit in what it wants to actually do!)? |
Indeed... I mean, we should always strive for requiring as little as possible, and I have been advocating for that just a little above any Web server would be a good idea, but it seems to me that the authn sets requirements in both ends. |
@TallTed Ah, got it. Yeah, definitely needs an auth client. (And I don't think that's against the spirit of the LDP spec, since, ahem, the LDP spec left authentication as out of scope :) ) |
Proposal following F2F meeting with @csarven , @timbl and @kjetilk present of 2019-10-30: We found it is advantageous to avoid lack of clarity in content types, since a fallback to defaults like Users of basic UAs (e.g. The cost of requiring clients to submit content type is thus much lower than cost of the requirement on servers and clients to deal with the consequences of wrong or useless content types. This points towards a strict interpretation, i.e. |
Does this mean that the uploads must include content types that are known to the solid server? Or just ANY content type (which might well include How does the server test whether the content type a client specifies for a resource is appropriate to its content? If the server does not test such, how does this requirement prevent "wrong or useless content types"? I do not see how simply forcing a content type to be present achieves the stated goals -- i.e., guaranteeing a suitable UX for that content, or that apps will know how to work with such content types. |
No. It can't prevent a bad client from adding bad data, if the client has no idea about the content-type, it will fall back to It is more about making the chance of breakage smaller, not prevent breakage entirely. |
"Other clients will not be able to use it for anything" overstates the impact. "Other clients" may inspect the payload of "Making the chance of breakage smaller" sounds to me much more like a SHOULD rule, than a MUST rule, as the latter will tend to lead to expectations that will not necessarily be satisfied. This decision is not mine in the end, but I hope that the documentation around it will be clear about actual effect, as opposed to wishful effect. |
The LDP spec says
I'm not sure how the SHOULD has been justified there, since I struggle to see any other valid way, so it seems we should have a MUST there.
The text was updated successfully, but these errors were encountered: