-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accept-Encoding header parsing and interpretation #8104
Comments
@bdraco I think handling the 1st task here for case-sensitivity is quite easy to do before #8063 so I'll submit that. However, the 3rd task to completely parsing it correctly per the RFC is less a priority to backport IMO. And implementing it raises a larger question for me... Although it's relatively easy in python, shouldn't that parsing be done by @Dreamsorcerer is this a question for you? |
The 3rd task would be partially accomplished by #7679, which is drafted for the 4.0 milestone. |
For the 2nd task regarding the behavior of the
Thoughts? |
I think we need to do some git archeology to figure out why |
My impression is that llhttp is just passing us header values. Meaning that it has parsed it as HTTP, but not converted anything based on the particular header names, which is probably out-of-scope. So, such changes would likely go into However, is there actually any merit in implementing this quality check? I've seen the same syntax for locales, and I think the general consensus is that this was over-engineered and basically pointless. The quality numbers fail to achieve anything more than defining an order of preference (i.e. the actual numbers are never used for anything, they're just sorted on). Therefore, browsers don't provide any way for users to customise these quality values, they just pick evenly spaced values according to order of user preference. Because of all this, browsers will also always order them with the highest quality number at the start, so in reality, you can just read the values left-to-right while ignoring the quality numbers and you'll get the same result. Parsing the quality numbers would therefore just be a waste of CPU-time. |
It was a boolean before #403 added gzip, and the code before that made it clear that it was actually a "force" to ignore |
I think the force parameter is pretty clear in that it forces a given compression. I'm not clear why you'd want to do this, so I'm not clear that turning it into a preference makes sense without understanding the use case (can we find any projects using the parameter?). Clearly it's not correct behaviour to use compression that the client claims not to support, but it also seems weird to try and use a fixed compression method in the first place... |
OK, there seems to be an important difference between Accept-Encoding and Accept-Language. The former can disallow encodings with q=0, so maybe we need to atleast check for that. Edit: OK, not a difference, but it's explicitly mentioned under Accept-Encoding, whereas it'd be very weird to try and exclude languages. |
Okay, makes sense 👍🏻
That's certainly correct - the numbers mean nothing but a sort order. I think the reason it's not just a list in preference order is that then it becomes impossible to communicate equal preference for some or all items. That's why the default quality if not specified is 1. To a lesser extent it's also a way to have negations with a wildcard, e.g. "compress;q=0, *" says I'll take any encoding except compress. Although even the spec admits this pattern has little to no practical value. FWIW, Apache does use the qualities in its content negotiation.
True they are not user-facing for sure (nor would they have any reason to be?), and the values are just picked to set the browser's preference.
This is definitely not true. The spec is clear that values without a quality default to 1 and browsers take advantage of that. For example, both Chrome and Firefox send What they send for the
|
I was referring to the real world behaviour of Accept-Language, and wondering if the same applies. Clearly that's not the case, so you are probably right that we should use the quality value here. |
I completely agree - neither makes sense to me. I think there is a use case for changing the server's default encoding order though. For example, a user might benchmark and find that sending large files over long distances is likely faster with gzip than deflate (i.e. the larger encoding time is offset by the faster transfer time). So they might want the order to be If I were to redesign the current API, it might look something like this to allow a set of encodings to be preferred first but then fallback to others supported: def __init__(self, ...) -> None:
...
self._ordered_codings = deque(ContentCoding, len(ContentCoding))
def enable_compression(self, *prefer: ContentCoding) -> None:
"""Enables response compression with preferred encodings."""
self._compression = True
for encoding in prefer:
self._ordered_codings.remove(encoding)
self._ordered_codings.extend(prefer)
self._ordered_codings.rotate(len(prefer))
async def _start_compression(self, request: "BaseRequest") -> None:
# Encoding comparisons should be case-insensitive
# https://www.rfc-editor.org/rfc/rfc9110#section-8.4.1
accept_encoding = request.headers.get(hdrs.ACCEPT_ENCODING, "").lower()
for coding in self._ordered_codings:
if coding.value in accept_encoding:
await self._do_start_compression(coding)
return |
Yeah, I'd be open to something like that. I wonder if it's worth adding this as a new feature (and potentially removing |
Here's my attempt at a search for There are some results using the old boolean, some oddly specifying "identity" 😕, some just specifying "gzip" that clearly would have no harm by turning to a preference (e.g. ignore the Home Assistant results), and even one result that is doing the job Passing "identity" is concerning because currently that would just do nothing, but under my proposal, it would be skipped because it's not going to be in the |
That gives me no results for some reason. I did create a script previously to search dependents, though it can be slow to run (it searches 1 repo every ~7 seconds due to rate limit): https://gist.github.com/Dreamsorcerer/70285fac0a11c3d9c26b577f7dd989a7
Might be interesting to see if any of them mention any rationale in the commit they were added...
My counter-proposal is to just add a new |
The markdown sanitizer keeps removing all the escapes from the regex parts. The search term should look like this with 20 results for me:
I didn't check, but one of the false arguments did have a comment mentioning it was a workaround for the double compression issue recently fixed by @bdraco.
Not as elegant, but safer of course. Also, consider that either way this becomes a "breaking" change. For example, the Home Assistant use is definitely not expecting a force, so a change would be required to specify |
Is your feature request related to a problem?
Forgive me - parts of this might be considered bug fix and parts feature request, but they are highly related and intertwined in the same code. These are things I noticed while working on #8063.
enable_compression()
will compareAccept-Encoding
case-insensitively, but retrieving static compressed files does not it should be case-insensitive per RFC 9110).force
parameter is used inenable_compression()
, it does not even considerAccept-Encoding
. I fail to see any real use case for this and feel it should be treated like a preference instead (i.e. override the default preference to usedeflate
to something else). If Brotli were implemented, it may very well be more performant thandeflate
, so preferring it might be desirable, but then the server no longer supports legacy browsers without Brotli support.Accept-Encoding
field per RFC 9110. There is simply anin
operator test of the string. For example:Describe the solution you'd like
Accept-Encoding
should be treated case-insensitively when returning static compressed files.force
parameter toenable_compression()
should really act like a preference, not an override (i.e. it should not be forced if the encoding is not supported per theAccept-Encoding
request header).Describe alternatives you've considered
None for the first 2, but admittedly the parsing issue is more or less a moot point since most clients always send the header with an explicit list and no quality factors (i.e. "gzip, deflate, br").
Related component
Server
Additional context
n/a
Code of Conduct
The text was updated successfully, but these errors were encountered: