-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sec-metadata
#280
Comments
So some initial reactions, which you shouldn't take too seriously because I haven't put that much thought into them yet:
|
Hey @dbaron, thanks for the feedback! I'd actually appreciate y'all spending a little more time on this in the somewhat near future, as it's something that turned out to be trivial to implement, and that doesn't seem controversial from the (few) conversations I've had with other vendors. It's feasible that we could ship it in the somewhat near future, and your feedback would be really helpful in ensuring that we do that in the right way. I'll also note that I started sketching out a more detailed spec at https://mikewest.github.io/sec-metadata/, which might relieve some of your interop concerns below. :) To the line-by-line:
I agree! Google's security folks are enthusiastic about this mechanism, and I'd like to let them loose on it in the wild.
I think that's accurate. The goal here is to give developers on the server enough context to make more informed decisions before doing all the work of putting together a response and delivering it to the client. Rather than relying on client-side rules (e.g. MIME type checks a la CORB), we can perform checks on the server that enable a priori decision-making, therefore avoiding things like timing attacks which are otherwise pervasive.
This proposal has some deployability advantages insofar as it doesn't require rearchitecture, renaming, or etc. All the server needs to do is read a header and make decisions. In many cases, those decisions can even be abstracted out of the application code itself, and into frontend servers which might be maintained and hardened by an entirely different team. I think that's very much worth exploring. I'd also note that things like
I agree that the size is a possible concern. That said, if it's something we ship, I'm pretty sure we can minimize the impact by tweaking the H/2 compression dictionary. We're just uploading ~4 enums, after all. The values remain static over time, and should be quite compressable.
We're actually already exposing the most complext bit of information via Service Workers (see The |
|
Hey, @annevk!
I'd still like to get the "
The interesting distinction in
I don't quite understand: are you saying that instead of Also: I don't see "potential destination" used anywhere in Fetch. Where is it used?
Huh. That was dumb. I'll fix it.
What bits of data are you uncomfortable sharing?
My point isn't that compression means that we can magically add everything ever to HTTP requests. I'm claiming that compression reduces the apparent cost of the header to something that seems pretty manageable, and (IMO) worth the tradeoff. |
|
Breaking insofar as folks might be looking at the
Turns out, @mnot removed bare identifiers as valid dictionary values in https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-05#appendix-A.1 (https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-05#section-4.4). It's strings or nothing.
The goal is indeed to allow the server to make more granular decisions about when to service a request. I'm claiming that as a substantial advantage. :)
|
/cc @arturjanc, since we're talking about him indirectly. :) |
In that case don't bother with potential destination. |
Basically, I see |
(Also, there's no guarantee that a cookie actually is |
To answer some of @annevk and @dbaron's questions above, I figured I'd elaborate on the benefits we expect to get out of When it comes to offering protections against cross-origin information leaks, it seems important to provide the server with context about the request which allows it to make a security decision upfront, before any server-side processing has occurred. For example, mechanisms using a response header such as Cross Origin Resource Policy leave the application susceptible to CSRF (because any side effects of handling a request on the server will remain even if the browser doesn't expose the response to its initiating context) and timing attacks (because the amount of time to generate a response and return it to the client doesn't change, and can be leaked via the usual side channels). A model where a server has a chance to reject an unexpected / unwanted request before taking any action is a powerful primitive which addresses cross-origin infoleaks in a more robust way. The main advantage of
Based on our experience, the request header model is significantly easier for developers to deploy than either the
The main trade-off on the Sec-Metadata side is that its design requires writing server-side code to inspect the request header and make security decisions. Developers often like to define security restrictions as static response headers in their server config -- they wouldn't be able to do this here. At the same time, the middleware to inspect My guess is that this model is the right trade-off for handling the problem of cross-origin information leakage in moderately complex applications: it gives developers enough information to make meaningful security decisions, without requiring adding extensive new machinery to the web platform. |
Thank you Artur @arturjanc for this nice explanation, and notably including an explanatory reference to whatwg/fetch#687 as well |
Would this header be restricted to secure contexts? |
It looks like we're going to add bare identifiers back into Structured Headers, FWIW. |
Yes. I had this in the explainer, but not in the spec: fixed in w3c/webappsec-fetch-metadata@70f9c34, thanks!
Ah! That would drop some quotes from the serialization, which would be nice! For completeness, and to follow up on a conversation earlier today: if we care more about the header size than usability or readability, we can treat this header as containing a boolean ( That's a thing we could do. I'm not sure it's a good idea. @travisleithead seemed to be in favor of the more verbose description. |
FWIW, the longer-term pseudo-plan for Structured Headers is that there will eventually be a much more wire-/parse-efficient format in a future version of HTTP (or extension). Perhaps not that efficient, but better. Even without that, remember that you've got H2/QUIC header compression, and these values are pretty stable (right), so size isn't the absolutely first concern. |
Rearding naming, @ylafon offhandedly mentioned |
w3c/webappsec-fetch-metadata#2 has the extent of the thought folks have put into the name. :) |
Any more feedback from y'all on this mechanism? I think we're aligning on a reasonable design here, and I'd like to get it moving forward. Perhaps we can discuss it at TPAC if y'all have questions? |
I am looking for a declaration that it’s the best thing since sliced bread,
and that each of y’all will ship it immediately, perhaps even before I can.
That seems unlikely...
If y’all have no opinions (and, therefore, no objections), that’s fine by
me. We can close this out with a big, green “Meh.” stamp from the TAG. :)
|
So. "Meh." stamp of approval and close this review out, @torgo? Or do y'all have thoughts you'd like to share? |
Sliced bread is overrated and I like the way this spec is going. |
🍞! Thanks, @ylafon. |
Hello @mikewest. Sorry for the delays. We discussed briefly on our call today and the two issues that came up were:
I think if we can address these issues then we're happy to close this one off. 🍞 |
As @mnot always says, HTTP header compression is a silver bullet panacea that cures all ills. Also, we discussed this above. See #280 (comment) and #280 (comment).
As @slightlyoff noted in the minutes, software providers know their software, and can ship rules themselves at the application layer, which will automagically protect their clients. Imagine Wordpress locking down non-navigational requests to their API endpoints, for instance. At the network layer, https://bugs.chromium.org/p/chromium/issues/detail?id=861678 is an exciting trip through the world of Web Application Firewalls, showing that they didn't like our initial pass at My expectation is that Google-like companies will farm the work of tuning |
@mikewest you are a bad man. As always, this is going to be a judgement call. Given the take-up of other WebAppSec mechanisms, I would be concerned if this were included in every request; if it's not going to be used in the vast majority of cases, why send it? Possible mitigations:
|
Regarding the header's size, we've made a few tweaks to the format in the last few weeks (dropped Sec-Metadata: cause=user-activated, destination=nested-document, site=cross-origin The longest subresource request would be 62 characters: Sec-Metadata: destination=serviceworker, site=cross-origin, mode=same-origin Many requests will be shorter (e.g.
Sec-Metadata: forced=1, destination=10, site=0 This isn't terribly legible, and basically requires a lookup table for We've got a boolean, a 20-value enum, a 3-value enum, and a 5-value enum. Let's give ourselves a whole byte for
Which encodes as a binary structured header value in 6 characters as: Sec-Metadata: *AADq* That's a direction we could go. Is it better? I'm not sure. Let's look at @mnot's suggestions:
With the exception of shifting What do y'all think? |
Please, please, please don't do that. The whole point of using structured headers is to avoid creating new parsers and formats -- along with the bugs and interop problems that come with them. Furthermore, if you go with a binary encoding, you're locking yourself into that set of values; if you want to add any, you'll need to mint new headers. IOW, don't over-optimise for header size. Making these less wordy (e.g., And, if you don't go binary, you can split into multiple headers, which does give you better compression out of the dynamic table. That's because each complete header value is stored in the dynamic table; if there are many permutations of a single header, it can blow out the total dynamic header table size and cause them to be pushed out. In your proposed approach, there are ~600 permutations of values, whereas if you split it out, there are 30 possible values. Even with the binary encoding (which again please don't do!), that's 30000 bytes of space in the dynamic table (assuming all permutations are seen on a connection) (see here for how to calculate the size of a header for purposes of the dynamic table). If we use this style:
I count it as a maximum of 1509 bytes in the dynamic table (again, assuming that all of the directives are seen). Much less impact, easy to parse and extensible to boot. The default dynamic table size for the headers in one direction on a connection is 4,096 bytes. While many browsers increase that on the response side, it's not clear whether it's safe to assume that on the request side, as servers generally have more stringent per-connection memory constraints. In actual use, the number of permutations seen on a connection will often be lower. However, from what I can tell from your use case, there will still be a fair amount of variance, no? Also, think about things like proxies that are mixing traffic from multiple clients onto one upstream connection. The upshot here is that HTTP header compression is heavily optimised for header values that don't have a lot of variance on a connection. Don't fight it :) |
Thanks, @mnot!
I generally agree. The proposal was somewhat ad absurdum in nature. I generally agree with @arturjanc's comment that "We should likely be making it easier for developers to use security mechanisms. Requiring application-level decompression is awkward and a barrier to entry." I would prefer human-readable headers when possible.
This sounds like good advice.
This sounds like a reasonable argument in favor of splitting the header into a million little shards. Thank you for the primer on header compression (whose constraints I think I still don't really understand); I appreciate the review. |
Hi, First of all, I like the whole idea of Sec-Metadata (is this the final name finally?). Solves problems, while also offering new ways to make exciting security and privacy measurements (passively, possibly actively, never mind for that until I write a grant proposal for that). I vehemently favor the legible version. I do not believe there is any particular bloat involved (also, as @mikewest and @mnot appear to say, compression, being the standard now, make it a no-case). Please do not write any binary-encoding tables of the kind above. Makes eyes hurt, also, among the others thinking of the poor developers, security engineers and pentesters, I do not think there is a valid case for not having legibility. In other words, do not make it an IAB-style* consent framework In other words, I'd say all is fine and on place. There is also no question from my personal perspective, that Sec-Metadata is a good thing, and its value is tangible to multiple stakeholders. [*] what do you consent to here? 00000100 11100001 00000101 00010000 00001100 10001110 |
Thanks for the comments, @mnot and @lknik! To the point about splitting the values across multiple headers, I looked into the data we're collecting from clients with Chrome's experimental web platform features flag enabled based on a total of ~155M requests to 200 Google services. In this data set, we have 81 permutations of the four main values From a developer ergonomics point of view my guess is that having the information in a single header would be slightly easier to reason about -- servers will generally need to look at several of the values to make a security decision, and developers may need to do the same when debugging any rejected requests. That said, it's not a hill I'd want to die on; if the performance benefits of the split header approach are substantial and request size is our main remaining concern, I think it would still be workable. |
@arturjanc Likewise, I'm not dead-set on splitting them out; just trying to illustrate that the binary approach isn't actually giving us what we might think. I agree it's likely to be top-heavy; isn't everything on the Web a Ziph curve, after all? WRT developer ergonomics - I don't know that accessing several single header values is more onerous than parsing one complex one -- especially if both approaches use structured headers -- but of course YMMV. From a compression efficiency perspective, the effect I illustrate will become more pronounced when you add more fields; it'd be interesting to see what Chrome's numbers are once you add |
For Given that there don't seem to be extremely strong opinions for either approach -- except for the universal dislike of the bitfield -- my guess is that we should let @mikewest make the call and then blame everything on him once things inevitably blow up ;-) |
Now that's a plan I can get behind! |
FWIW, I like the separate header approach as each header would contain a simple token which is extremely easy to code against. Given that even browsers don't interoperate on header parsers for anything more complex than a token (and I've found even differences in tokens, due to whitespace differences), it seems better to err on the side of simplicity for server operators. |
Hi! Quick note on @torgo 's comment as an additional vote: re http header bloat, I defer to all the http experts on this thread :). I do have a strong preference for human readable formats and simplicity; but if we are going down the route of single header, lets not make people write a new parser in every language they use. I would rather do JSON. re if this is only for industrial scale web parties: I believe this header will be useful/important to everyone. Whether or not they adopt it is a question of how much they invest in security and what other priorities they have (no point protecting against this if you have an XSS vuln everyday). For example, Dropbox would love to adopt this. While we are reasonably popular, we aren't as popular as Google :) In terms of comparison to previous web standards, I suspect this will be a lot more easier for security teams to adopt than CSP. Additionally, I will note that this header isn't just about "defense in depth". There is a whole class of side channel attacks, demonstrated many a time in previous research, that are impossible to prevent right now on the web platform. This header will at least make it possible to defend against these attacks. If 2018 has taught us something, it is better that we start protecting against side channel attacks before they become trivial :) Finally, I believe one use case for this would also be protecting internal webapps from attacks. While these apps won't show up on any popularity contests, there are pretty sensitive apps and impact of protecting them is huge. |
Discussed on call 28-Nov. We agreed to close this based on the feedback provided. Also noted @mnot's blog post which is related to the http header "bloat" issue. Thanks all. |
Split the header into a million little pieces based on this conversation: https://mikewest.github.io/sec-metadata/. Thanks, all. |
Good morning, friendly TAG!
I'm requesting a (p)review of:
sec-metadata
Further details (optional):
You should also know that you're probably my favorite web architecture review body. Top 10, certainly.
We'd prefer the TAG provide feedback as (please select one):
The text was updated successfully, but these errors were encountered: