Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set .metadata suffix on URL path #2123

Merged
merged 3 commits into from
Mar 4, 2024
Merged

Set .metadata suffix on URL path #2123

merged 3 commits into from
Mar 4, 2024

Conversation

charliermarsh
Copy link
Member

@charliermarsh charliermarsh commented Mar 1, 2024

Summary

Ensures that we don't add the .metadata suffix after the fragment, if it exists.

@charliermarsh charliermarsh added bug Something isn't working compatibility Compatibility with a specification or another tool labels Mar 1, 2024
@charliermarsh
Copy link
Member Author

@hmc-cs-mdrissi -- are you interested in testing this branch?

@charliermarsh
Copy link
Member Author

@pradyunsg -- sorry to bother, it was pointed out to me that pip strips fragments when making HTTP requests. Do you know if this is encoded in any spec?

@hmc-cs-mdrissi
Copy link

hmc-cs-mdrissi commented Mar 2, 2024

Yes I am happy to. First time building rust codebase and fairly smooth experience.

It sadly failed though,

Verbose Logs
     Running `target/debug/uv --verbose pip install '--index-url=https://AUTH@index/python/virtual/' internal-lib`
 uv::requirements::from_source source=internal-lib
    0.001651s DEBUG uv_interpreter::python_environment Found a virtualenv through VIRTUAL_ENV at: /Users/pa-loaner/Snapchat/Dev/.venvs/bento_uv2
    0.002148s DEBUG uv_interpreter::interpreter Cached interpreter info for Python 3.9.16, skipping probing: /Users/pa-loaner/Snapchat/Dev/.venvs/bento_uv2/bin/python
    0.002237s DEBUG uv::commands::pip_install Using Python 3.9.16 environment at /Users/pa-loaner/Snapchat/Dev/.venvs/bento_uv2/bin/python
    0.010349s DEBUG uv_client::registry_client Using registry request timeout of 300s
 uv_client::flat_index::from_entries
 uv_resolver::resolver::solve
      0.306961s   0ms DEBUG uv_resolver::resolver Solving with target Python version 3.9.16
   uv_resolver::resolver::choose_version package=root
   uv_resolver::resolver::get_dependencies package=root, version=0a0.dev0
        0.307338s   0ms DEBUG uv_resolver::resolver Adding direct dependency: internal-lib*
   uv_resolver::resolver::choose_version package=internal-lib
     uv_resolver::resolver::package_wait package_name=internal-lib
 uv_resolver::resolver::process_request request=Versions internal-lib
   uv_client::registry_client::simple_api package=internal-lib
     uv_client::cached_client::get_cacheable
       uv_client::cached_client::read_and_parse_cache file=/Users/pa-loaner/Library/Caches/uv/simple-v3/28d1ce4468d5ce6b/internal-lib.rkyv
 uv_resolver::resolver::process_request request=Prefetch internal-lib *
 uv_client::cached_client::from_path_sync path="/Users/pa-loaner/Library/Caches/uv/simple-v3/28d1ce4468d5ce6b/internal-lib.rkyv"
          0.325625s  17ms DEBUG uv_client::cached_client Found fresh response for: https://index/python/virtual/internal-lib/
   uv_resolver::version_map::from_metadata
   uv_distribution::distribution_database::get_or_build_wheel_metadata dist=internal-lib==0.1.602402
     uv_client::registry_client::wheel_metadata built_dist=internal-lib==0.1.602402
       uv_client::cached_client::get_serde
         uv_client::cached_client::get_cacheable
           uv_client::cached_client::read_and_parse_cache file=/Users/pa-loaner/Library/Caches/uv/wheels-v0/index/28d1ce4468d5ce6b/internal-lib/internal-lib-0.1.602402-py3-none-any.msgpack
        0.618904s 311ms DEBUG uv_resolver::resolver Searching for a compatible version of internal-lib (*)
 uv_client::cached_client::from_path_sync path="/Users/pa-loaner/Library/Caches/uv/wheels-v0/index/28d1ce4468d5ce6b/internal-lib/internal-lib-0.1.602402-py3-none-any.msgpack"
        0.619005s 311ms DEBUG uv_resolver::resolver Selecting: internal-lib==0.1.602402 (internal-lib-0.1.602402-py3-none-any.whl)
   uv_resolver::resolver::get_dependencies package=internal-lib, version=0.1.602402
     uv_resolver::resolver::distributions_wait package_id=internal-lib-0.1.602402
              0.619291s   0ms DEBUG uv_client::cached_client No cache entry for: index/python/virtual/internal-lib/0.1.602402/internal-lib-0.1.602402-py3-none-any.whl
           uv_client::cached_client::fresh_request url="index/python/virtual/internal-lib/0.1.602402/internal-lib-0.1.602402-py3-none-any.whl"
error: Failed to download: internal-lib==0.1.602402
  Caused by: HTTP status client error (404 Not Found) for url (index/python/virtual/internal-lib/0.1.602402/internal-lib-0.1.602402-py3-none-any.whl)

cargo run -- --verbose pip install --index-url=$INDEX_URL internal-lib being exact command I used, while pip install --index-url=$INDEX_URL works fine. Unsure what remaining differences if any exist in request uv sends vs pip sends especially as uv is able to get index page successfully so I know authentication works for some requests.

@charliermarsh
Copy link
Member Author

Thank you! That's a bummer. So it's an identical URL to what pip is requesting, right? My only remaining guess then is that we're losing the auth somewhere, and the server is responding with a 404.

@hmc-cs-mdrissi
Copy link

Yes the url is identical to pip based on breakpointing. Auth headers is my best guess and the headers being used for some requests (index fetch vs wheel fetch) are inconsistent. I see pip has PipSession object that manages auth and the wheel fetch does store the needed authentication tokens.

@hmc-cs-mdrissi
Copy link

hmc-cs-mdrissi commented Mar 2, 2024

If we could log the headers in error message used along with url I can compare exact headers pip uses vs uv.

edit: Here's pip's headers

{'User-Agent': 'pip/24.1.dev0 {"ci":null,"cpu":"x86_64","distro":{"name":"macOS","version":"14.3"},"implementation":{"name":"CPython","version":"3.9.16"},"installer":{"name":"pip","version":"24.1.dev0"},"openssl_version":"OpenSSL 1.1.1t  7 Feb 2023","python":"3.9.16","rustc_version":"1.76.0","setuptools_version":"69.0.3","system":{"name":"Darwin","release":"23.3.0"}}', 'Accept-Encoding': 'identity', 'Accept': '*/*', 'Connection': 'keep-alive', 'Authorization': 'Basic TENBdjE6ZXlKaGJHY2lPaUpGVXpJMU5pSXN...'}

I'm assuming authorization being key piece. The authorization header is same for both pip's wheel request and index request.

edit 2: Error message is also same as with fragment. So it's possible that fragment being present was a red herring for my index server and only auth headers is the issue. May still be worth for pip compatibility excluding the fragment.

@charliermarsh
Copy link
Member Author

Are the two URLs (for index fetch vs. wheel fetch) on different domains, or do they use different schemes, or anything like that?

@hmc-cs-mdrissi
Copy link

hmc-cs-mdrissi commented Mar 2, 2024

The first url is,

https://registry.company-dev.net/python/virtual/lib-name/

The second url is,

https://registry.company-dev.net/python/virtual/lib-name/0.1.602402/lib-name-0.1.602402-py3-none-any.whl

So the url is exact same. I may take a look tomorrow at uv closer, just not confident as it'll be my first time reading rust code and figuring out how to use a rust debugger.

edit: The index url if it helps looks like,

https://LCAv1:SECURITY_TOKEN@registry.company-dev.net/python/virtual/

@charliermarsh
Copy link
Member Author

Okay, thanks. And how do you typically provide the credentials? (We have this safe_copy_url_auth method that copies the credentials from one URL to the next if they are on the same "realm", which is why I asked about the domains.)

@hmc-cs-mdrissi
Copy link

The credentials are contained within the index-url. Not sure you saw the edit I made to show the index url.

@charliermarsh
Copy link
Member Author

Okay cool, yeah, I would expect this to work. I can push a branch that adds more logging for the headers and such tomorrow if you'd like. Would love to get this working for you since if you're running into this issue, I'm sure you won't be the only one.

@hmc-cs-mdrissi
Copy link

hmc-cs-mdrissi commented Mar 2, 2024

A few other things is I am able to successfully curl the wheel fetch url with a simple command like,

curl -v -H "authorization: basic `echo -n "LCAv1:AUTH_TOKEN" | base64` https://registry.company-dev.net/python/virtual/lib-name/0.1.602402/lib-name-0.1.602402-py3-none-any.whl

So I think the only important thing is header being used and that index fetching successfully finding latest version feels like a sign that uv has the right authorization header for first fetch. The pip authorization header looks same as one in that curl command.

edit: More logging would be very helpful and will definitely try it.

@charliermarsh
Copy link
Member Author

@hmc-cs-mdrissi - What kind of logging would be most useful? Like, all headers?

@hmc-cs-mdrissi
Copy link

Yes headers for each request sent. Most important two requests being index fetch + wheel fetch.

Alternatively if you could point me to the code where these requests are sent I could try to do some print debugging

@pradyunsg
Copy link

Do you know if this is encoded in any spec?

I just saw this.

I don't think it's written in any packaging spec but I'm pretty sure the hash fragment has to be dropped when querying the server since that's how URLs work? The fragment part is usually meant to reference something within the resource, eg anchors on HTML pages, and isn't sent to the server.

From https://url.spec.whatwg.org/#concept-url-fragment:

A URL’s fragment is either null or an ASCII string that can be used for further processing on the resource the URL’s other components identify.

I don't think I've ever seen someone send an HTTP request with # fragment in the path segment, and I'm pretty sure it'll fail for most servers.

I'm actually surprised if PyPI actually responds to a malformed request like that, if that is indeed what y'all are doing.

@charliermarsh
Copy link
Member Author

Honestly very possible that reqwest is dropping the fragments already.

@pradyunsg
Copy link

Ah, yea, that makes sense! It would be a bit of a surprise if it wasn't. 😅

@charliermarsh
Copy link
Member Author

Merging a minimal version of this that just fixes one erroneous site.

@charliermarsh charliermarsh changed the title Drop hash fragment when downloading from registry Set .metadata suffix on URL path Mar 4, 2024
@charliermarsh charliermarsh enabled auto-merge (squash) March 4, 2024 20:47
@charliermarsh
Copy link
Member Author

@hmc-cs-mdrissi -- Merging but I don't expect this to fix the problem here -- let's continue in the linked issue.

@charliermarsh charliermarsh merged commit 14d968a into main Mar 4, 2024
7 checks passed
@charliermarsh charliermarsh deleted the charlie/fragment branch March 4, 2024 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compatibility Compatibility with a specification or another tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants