Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncated Body::len() returned when using isahc on Apple M1 #265

Closed
ElliotSis opened this issue Dec 6, 2020 · 6 comments · Fixed by #267
Closed

Truncated Body::len() returned when using isahc on Apple M1 #265

ElliotSis opened this issue Dec 6, 2020 · 6 comments · Fixed by #267
Labels
bug Something is borken

Comments

@ElliotSis
Copy link

See http-rs/surf#270 for more context.

While the documentation stipulates that Body::len() should only be used as hint, it sounds surprising that the length is truncated only when using Apple M1, would it be worth investigating or is this WAI?

Also as per the confusion in http-rs/surf#270, would it make sense to rename Body::len() to something like Body::size_hint() to avoid further confusion?

@sagebind
Copy link
Owner

sagebind commented Dec 6, 2020

Thanks for filing this issue! This is definitely worth investigating in, as this should not be the case.

As for the documentation note on Body::len(), the reason for this is because it is populated almost always directly from the Content-Length response header, which depending on the HTTP version, may be present. Even if it is present, Isahc can make no guarantees about its value since it comes from the server, and sometimes servers that are implemented poorly send wrong values. (Isahc's designed to be generous in handling server quirks, but strict about the requests it produces.)

Put another way, you should use Body::len() to help inform how you read the response, and treat it as equivalent to Content-Length, but you should not use it to allocate an exact buffer size for the response without doing any other validation, as that's a good way to introduce buffer overrun vulnerabilities. 😉

That said, if Body::len() returns Some(x), where x is actually an incorrect length, then that should only be if the server is returning a bad length. Do we know what the raw response headers look like? I don't have an M1 machine to test on. The raw response headers should be written to the TRACE log level under the isahc::wire log target. (If using env_logger, you could enable these with RUST_LOG=isahc::wire=trace.)

@sagebind
Copy link
Owner

sagebind commented Dec 6, 2020

Ah, I have a suspicion as to what might be happening here -- I bet the response is gzipped, in which case Content-Length tells you nothing about the response body if automatic gzip decompression is enabled. I'll look into this later today when I have a chance.

@sagebind sagebind added the bug Something is borken label Dec 6, 2020
@ElliotSis
Copy link
Author

Thanks for the quick reply Stephen!

Do we know what the raw response headers look like?

When using Safari, I get the right Content-Length when looking at the developer console.
Following your instructions, I can also confirm that the truncated Content-Length is passed through in the logs (5825).

Here are the logs (happy to provide more if needed):

[…] 
[2020-12-06T14:25:48Z TRACE isahc::agent] poll_messages
[2020-12-06T14:25:48Z TRACE isahc::agent] -> poll_messages
[2020-12-06T14:25:48Z TRACE isahc::agent] <- poll_messages
[2020-12-06T14:25:48Z TRACE isahc::agent] -- poll_messages
[2020-12-06T14:25:48Z TRACE isahc::agent] dispatch
[2020-12-06T14:25:48Z TRACE isahc::agent] -> dispatch
[2020-12-06T14:25:48Z TRACE isahc::agent] <- dispatch
[2020-12-06T14:25:48Z TRACE isahc::agent] -- dispatch
[2020-12-06T14:25:48Z TRACE isahc::agent] poll_messages
[2020-12-06T14:25:48Z TRACE isahc::agent] -> poll_messages
[2020-12-06T14:25:48Z TRACE isahc::agent] <- poll_messages
[2020-12-06T14:25:48Z TRACE isahc::agent] -- poll_messages
[2020-12-06T14:25:48Z TRACE isahc::agent] dispatch
[2020-12-06T14:25:48Z TRACE isahc::agent] -> dispatch
[2020-12-06T14:25:48Z TRACE isahc::handler] -> handler
[2020-12-06T14:25:48Z TRACE isahc::handler] << HTTP/2 200 \r\n 
[2020-12-06T14:25:48Z TRACE isahc::handler] <- handler
[2020-12-06T14:25:48Z TRACE isahc::handler] header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> header
[2020-12-06T14:25:48Z TRACE isahc::handler] <- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> handler
[2020-12-06T14:25:48Z TRACE isahc::handler] << date: Sun, 06 Dec 2020 14:25:48 GMT\r\n 
[2020-12-06T14:25:48Z TRACE isahc::handler] <- handler
[2020-12-06T14:25:48Z TRACE isahc::handler] header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> header
[2020-12-06T14:25:48Z TRACE isahc::handler] <- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> handler
[2020-12-06T14:25:48Z TRACE isahc::handler] << expires: -1\r\n 
[2020-12-06T14:25:48Z TRACE isahc::handler] <- handler
[2020-12-06T14:25:48Z TRACE isahc::handler] header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> header
[2020-12-06T14:25:48Z TRACE isahc::handler] <- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> handler
[2020-12-06T14:25:48Z TRACE isahc::handler] << cache-control: private, max-age=0\r\n 
[2020-12-06T14:25:48Z TRACE isahc::handler] <- handler
[2020-12-06T14:25:48Z TRACE isahc::handler] header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> header
[2020-12-06T14:25:48Z TRACE isahc::handler] <- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> handler
[2020-12-06T14:25:48Z TRACE isahc::handler] << content-type: text/html; charset=ISO-8859-1\r\n 
[2020-12-06T14:25:48Z TRACE isahc::handler] <- handler
[2020-12-06T14:25:48Z TRACE isahc::handler] header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> header
[2020-12-06T14:25:48Z TRACE isahc::handler] <- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> handler
[2020-12-06T14:25:48Z TRACE isahc::handler] << p3p: CP=\"This is not a P3P policy! See g.co/p3phelp for more info.\"\r\n 
[2020-12-06T14:25:48Z TRACE isahc::handler] <- handler
[2020-12-06T14:25:48Z TRACE isahc::handler] header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> header
[2020-12-06T14:25:48Z TRACE isahc::handler] <- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> handler
[2020-12-06T14:25:48Z TRACE isahc::handler] << content-encoding: gzip\r\n 
[2020-12-06T14:25:48Z TRACE isahc::handler] <- handler
[2020-12-06T14:25:48Z TRACE isahc::handler] header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> header
[2020-12-06T14:25:48Z TRACE isahc::handler] <- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> handler
[2020-12-06T14:25:48Z TRACE isahc::handler] << server: gws\r\n 
[2020-12-06T14:25:48Z TRACE isahc::handler] <- handler
[2020-12-06T14:25:48Z TRACE isahc::handler] header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> header
[2020-12-06T14:25:48Z TRACE isahc::handler] <- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> handler
[2020-12-06T14:25:48Z TRACE isahc::handler] << content-length: 5825\r\n 
[2020-12-06T14:25:48Z TRACE isahc::handler] <- handler
[2020-12-06T14:25:48Z TRACE isahc::handler] header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> header
[2020-12-06T14:25:48Z TRACE isahc::handler] <- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> handler
[2020-12-06T14:25:48Z TRACE isahc::handler] << x-xss-protection: 0\r\n 
[2020-12-06T14:25:48Z TRACE isahc::handler] <- handler
[2020-12-06T14:25:48Z TRACE isahc::handler] header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> header
[2020-12-06T14:25:48Z TRACE isahc::handler] <- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> handler
[2020-12-06T14:25:48Z TRACE isahc::handler] << x-frame-options: SAMEORIGIN\r\n 
[2020-12-06T14:25:48Z TRACE isahc::handler] <- handler
[2020-12-06T14:25:48Z TRACE isahc::handler] header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> header
[2020-12-06T14:25:48Z TRACE isahc::handler] <- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -- header
[2020-12-06T14:25:48Z TRACE isahc::handler] -> handler
[…]

Ah, I have a suspicion as to what might be happening here -- I bet the response is gzipped, in which case Content-Length tells you nothing about the response body if automatic gzip decompression is enabled.

Cool!
Looks like you have a good idea as to what’s going on now, thanks for looking into this!

@sagebind
Copy link
Owner

sagebind commented Dec 8, 2020

I have a fix for this that I will make available soon, both for the master branch and for the 0.9 series. The fix indeed is to simply not include any length in the body whenever decompression is being performed on your behalf, which seems obvious in retrospect. 😓

No one has probably noticed this before because it is likely uncommon for the server to return a Content-Length when the response is compressed, since the only way the server could know the length is if the compressed response was pre-compressed ahead of time. Google's CDN probably gzips the response and caches it, then serves it from the cache with a Content-Length based on the cached, compressed data.

sagebind added a commit that referenced this issue Dec 8, 2020
When automatic decompression is enabled and a server returns a `Content-Length` response header for compressed content, `Body::len` was naively returning the unmodified value of `Content-Length` as the body length. This is incorrect because the `Content-Length` in this situation indicates the _compressed_ size of the body, not the uncompressed size.

Since `Body::len` is supposed to be reporting the size of the body that it is returning if known, returning the uncompressed size is confusing and buggy, and can (and does) cause issues for downstream users.

This fixes the issue by simply returning `None` as the body length whenever we are decompressing the response body on the user's behalf, since it is impossible for us to know the uncompressed size ahead of time.

Fixes #265.
@ElliotSis
Copy link
Author

Nice, thanks for the quick fix Stephen!

Google's CDN probably gzips the response and caches it, then serves it from the cache with a Content-Length based on the cached, compressed data.

What's odd is that I wasn't able to repro this issue on x86 (linux), while using the same program built with the same version of rustc. I was also able to reproduce this on a variety of websites, I couldn't find a URL that didn't reproduce the issue (I didn't try many though, happy to try with more). I wonder if the same issue is reproducible on macOS x86.

sagebind added a commit that referenced this issue Dec 9, 2020
When automatic decompression is enabled and a server returns a `Content-Length` response header for compressed content, `Body::len` was naively returning the unmodified value of `Content-Length` as the body length. This is incorrect because the `Content-Length` in this situation indicates the _compressed_ size of the body, not the uncompressed size.

Since `Body::len` is supposed to be reporting the size of the body that it is returning if known, returning the uncompressed size is confusing and buggy, and can (and does) cause issues for downstream users.

This fixes the issue by simply returning `None` as the body length whenever we are decompressing the response body on the user's behalf, since it is impossible for us to know the uncompressed size ahead of time.

See #265.

Backported from commit 5ae995f.
@sagebind
Copy link
Owner

sagebind commented Dec 9, 2020

This should now be fixed in the 0.9.14 release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is borken
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants