Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

breaking aws s3 usage with requests 2.32.0 #6711

Open
kristianelliott80 opened this issue May 21, 2024 · 13 comments
Open

breaking aws s3 usage with requests 2.32.0 #6711

kristianelliott80 opened this issue May 21, 2024 · 13 comments

Comments

@kristianelliott80
Copy link

requests 2.32.0 introduced a change #6644 that strips double /.

This has introduced an issue where generated presigned urls for s3 keys that start with a / can no longer be used. requests now strips that second / meaning that the key is modified and the signature does not match resulting in 403 errors. We can adjust to remove the leading / in our keys but this may be affecting other users or use cases

Expected Result

To be able to use presigned_urls for s3 keys with leading "/"

Actual Result

URL that was passed was modified resulting in a 403 error response from aws

Reproduction Steps

import requests
import boto3
s3 = boto3.client('s3')
bucket = "bucket"
key = "/key_with_leading_slash.txt"
presigned_url = s3.generate_presigned_url("get_object", Params={'Bucket': bucket, 'Key': key})
requests.get(presigned_url)

System Information

$ python -m requests.help
{
  "chardet": {
    "version": "5.1.0"
  },
  "charset_normalizer": {
    "version": "3.1.0"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.4"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.9.6"
  },
  "platform": {
    "release": "10",
    "system": "Windows"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.32.1"
  },
  "system_ssl": {
    "version": "101010bf"
  },
  "urllib3": {
    "version": "1.26.15"
  },
  "using_charset_normalizer": false,
  "using_pyopenssl": false
}
@crabhi
Copy link

crabhi commented Jun 26, 2024

I can second this. It's not only AWS. We just hit this problem with a completely unrelated web service that also behaves differently if the double leading slash is normalized to one.

@sigmavirus24
Copy link
Contributor

I believe the s3 use case is being addressed by AWS in their SDK.

As for a service that expects a URI whose path is not normalized and returns different behavior, that's not compliant with any RFC or security best practices. I'm not losing sleep over that not working

@crabhi
Copy link

crabhi commented Jun 26, 2024

Thanks for your response! Why do you think double slash is not compliant with any RFC? I agree it's unusual but that service is not under our control.

Reading RFC3986, section 3.3, it seems to me the case of double leading slash is legal. It doesn't contradict anything in the text and it conforms to the ABNF below: using path-abempty (because authority is present, also used in ABNF at the beginning of Sec. 3). The first segment would be zero-length which is allowed in path-abempty as opposed to path-absolute.

@crabhi
Copy link

crabhi commented Jun 26, 2024

Another case in point - curl handles leading double slash just fine. Source: my testing with mitmproxy.

@sigmavirus24
Copy link
Contributor

It is absolutely a valid URI. It's also semantically equivalent to the normalized version and must be treated that way. GET //example is equivalent to GET /example

@crabhi
Copy link

crabhi commented Jun 26, 2024

What defines the normalization? In that RFC, I can find only about "dot-segment removal". This is a genuine question - I'm but a user. I have never had to implement URL parsing, so there may be another RFC I don't know about that defines this normalization.

The path //example is equivalent to /example in a filesystem. But is it necessarily true for an URI/URL?

@SpoonMeiser
Copy link

I believe the s3 use case is being addressed by AWS in their SDK.

@sigmavirus24 where have you seen this? Is there an issue or something you can point us towards?

@sigmavirus24
Copy link
Contributor

I don't follow the issues there but I had a conversation with one of the python SDK maintainers. I won't tag them here though so that they don't get harangued as this community is wont to do

@SpoonMeiser
Copy link

I couldn't find an issue for it, so I created one: boto/boto3#4181

@sigmavirus24 if you have another conversation with your maintainer friend, you could point him at this issue, and those of us that don't care whether the requests behaviour is technically right or not and only want it to work with S3, can watch to see when that issue gets resolved.

@crabhi
Copy link

crabhi commented Jun 28, 2024

For what it's worth, I don't care that much about RFC legalese. The problem is I also see a web service (completely unrelated to S3, not even a storage) that doesn't work correctly when queried from requests.

If we can agree the previous behaviour was a bug WRT the colon handling in the first path segment but the current fix introduced another bug, I can try to contribute a patch to urllib3 to fix the colon issue there, so that #6644 can be reverted.

@sigmavirus24
Copy link
Contributor

I don't agree that stripping the redundant and superfluous slash is a bug. I also think that urllib3 can be improved in general around this but doing so in a backwards compatible fashion that doesn't recreate this bug but in a lower part of the stack isn't going to improve things

@crabhi
Copy link

crabhi commented Jul 1, 2024

I tried a few other clients - Python stdlib, Curl, Go stdlib, Elixir Req, Groovy stdlib (probably same as Java) - and I couldn't find another one that strips the slashes. Also, I've tested a few servers (Python http.server, Nginx and Apache) and neither of them "normalizes" the slashes.

https://gist.github.com/crabhi/080d746e3eb4fc53380bc8291cdd0f7d

@sigmavirus24 what leads you to believe the slash is "redundant and superfluous"? I see this is not an issue that would affect many people, so I'm willing to contribute a fix that would work in line with other clients. Other options I have are no joy - staying on 2.31, or maintaining a fork, or using Python stdlib for this particular request.

@SpoonMeiser
Copy link

For what it's worth, the issue I raised against boto got rejected, so it appears the python SDK developers are not planning to address this in the SDK.

Instead, the suggestion is for when using requests, to use a prepared request and escape the first slash. I have not tested this myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants