-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lazier lazy_wheel #11481
base: main
Are you sure you want to change the base?
lazier lazy_wheel #11481
Conversation
# The 416 response message contains a Content-Range indicating an | ||
# unsatisfied range (that is a '*') followed by a '/' and the current | ||
# length of the resource. E.g. Content-Range: bytes */12777 | ||
content_length = int(e.response.headers["content-range"].rsplit("/", 1)[-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want some error handling here (and other places where we read the header) in case the server sends something invalid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors here probably devolve to "range request not supported" and should trigger a non-lazy retry.
# lowercase content-range to support s3 | ||
self._length = int(tail.headers["content-range"].partition("/")[-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought headers
is a special case-insensitive mapping anyway, does lower-casing actually matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used a requests -> boto3 adapter that is case sensitive, so for conda we can point this at s3 if wanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is an S3 issue? I’d very much prefer this weird context to not get into pip. It’s OK we use lower-cased keys here (because it does not really matter for us), but the comment confuses things.
def __exit__(self, *exc: Any) -> None: | ||
self._file.__exit__(*exc) | ||
def __exit__(self, *exc: Any) -> bool | None: | ||
print(self._request_count, "requests to fetch metadata from", self._url[107:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray debugging code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. A test makes sure _request_count is <= the maximum of 3. It might be possible to construct a pathological wheel where the .dist-info files were all over, that would rather have > 3 requests...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which test are you referring to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use log.debug()
like we do elsewhere in this change?
headers = base_headers.copy() | ||
headers["Range"] = f"bytes={start}-{end}" | ||
log.debug("%s", headers["Range"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should make the log message more useful.
infolist = zf.infolist() | ||
for info in infolist: | ||
# should be (wheel filename without extension etc) + (.dist-info/) | ||
if ".dist-info/" in info.filename: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there’s a better detection logic somewhere (probably in wheel installation code) for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pypa/wheel wheelfile.py it derives the .dist-info dirname from the parsed wheel filename. https://github.com/pypa/wheel/blob/main/src/wheel/wheelfile.py#L49, there probably is similar code in pip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is similar code existing in the code base, we should extract the common logic.
Anybody try this? |
Not in pip, but I do keep meaning to extract the code and try it in my metadata downlaoder utility. Too many jobs, too little time, though 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome!!
@dholth after fixing a typo and trying this at https://github.com/cosmicexplorer/pip/tree/lazier-wheel, I'm finding this error when trying this against pypi: > python3.8 -m pip -vvv install --report test.json --dry-run --ignore-installed --use-feature=fast-deps 'numpy>=1.19.5' 'keras==2.4.3' 'mtcnn' 'pillow>=7.0.0' 'bleach>=2.1.0' 'tensorflow-gpu==2.5.3'
...
Collecting keras==2.4.3
Obtaining dependency information from keras 2.4.3
bytes=-10240
Looking up "https://files.pythonhosted.org/packages/44/e1/dc0757b20b56c980b5553c1b5c4c32d378c7055ab7bfa92006801ad359ab/Keras-2.4.3-py2.py3-none-any.whl" in the cache
Request header has "no-cache", cache bypassed
Starting new HTTPS connection (1): files.pythonhosted.org:443
https://files.pythonhosted.org:443 "GET /packages/44/e1/dc0757b20b56c980b5553c1b5c4c32d378c7055ab7bfa92006801ad359ab/Keras-2.4.3-py2.py3-none-any.whl HTTP/1.1" 501 462
ERROR: Could not install packages due to an OSError.
Traceback (most recent call last):
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/commands/install.py", line 377, in run
requirement_set = resolver.resolve(
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve
result = self._result = resolver.resolve(
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_vendor/resolvelib/resolvers.py", line 397, in resolve
self._add_to_criteria(self.state.criteria, r, parent=None)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
if not criterion.candidates:
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_vendor/resolvelib/structs.py", line 156, in __bool__
return bool(self._sequence)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
return any(self)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
return (c for c in iterator if id(c) not in self._incompatible_ids)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
candidate = func()
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/factory.py", line 206, in _make_candidate_from_link
self._link_candidate_cache[link] = LinkCandidate(
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/candidates.py", line 293, in __init__
super().__init__(
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
self.dist = self._prepare()
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/candidates.py", line 225, in _prepare
dist = self._prepare_distribution()
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/candidates.py", line 304, in _prepare_distribution
return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/operations/prepare.py", line 532, in prepare_linked_requirement
metadata_dist = self._fetch_metadata_only(req)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/operations/prepare.py", line 385, in _fetch_metadata_only
) or self._fetch_metadata_using_lazy_wheel(req.link)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/operations/prepare.py", line 452, in _fetch_metadata_using_lazy_wheel
return dist_from_wheel_url(name, url, self._session)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/network/lazy_wheel.py", line 36, in dist_from_wheel_url
with LazyZipOverHTTP(url, session) as zf:
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/network/lazy_wheel.py", line 67, in __init__
tail = self._stream_response(start="", end=CONTENT_CHUNK_SIZE)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/network/lazy_wheel.py", line 210, in _stream_response
response.raise_for_status()
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_vendor/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
pip._vendor.requests.exceptions.HTTPError: 501 Server Error: Unsupported client range for url: https://files.pythonhosted.org/packages/44/e1/dc0757b20b56c980b5553c1b5c4c32d378c7055ab7bfa92006801ad359ab/Keras-2.4.3-py2.py3-none-any.whl Apparently > curl -L -H 'Range: bytes=-10240' 'https://files.pythonhosted.org/packages/44/e1/dc0757b20b56c980b5553c1b5c4c32d378c7055ab7bfa92006801ad359ab/Keras-2.4.3-py2.py3-none-any.whl'
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>501 Unsupported client range</title>
</head>
<body>
<h1>Error 501 Unsupported client range</h1>
<p>Unsupported client range</p>
<h3>Error 54113</h3>
<p>Details: cache-ewr18138-EWR 1691348077 3195631842</p>
<hr>
<p>Varnish cache server</p>
</body>
</html> Is |
Negative end-of-file ranges are standard http. Real shame if proxy rejects
them.
…On Sun, Aug 6, 2023, 3:02 PM Danny McClanahan ***@***.***> wrote:
@dholth <https://github.com/dholth> after fixing a typo and trying this
at https://github.com/cosmicexplorer/pip/tree/lazier-wheel, I'm finding
this error when trying this against pypi:
> python3.8 -m pip -vvv install --report test.json --dry-run --ignore-installed --use-feature=fast-deps 'numpy>=1.19.5' 'keras==2.4.3' 'mtcnn' 'pillow>=7.0.0' 'bleach>=2.1.0' 'tensorflow-gpu==2.5.3'
...
Collecting keras==2.4.3
Obtaining dependency information from keras 2.4.3
bytes=-10240
Looking up "https://files.pythonhosted.org/packages/44/e1/dc0757b20b56c980b5553c1b5c4c32d378c7055ab7bfa92006801ad359ab/Keras-2.4.3-py2.py3-none-any.whl" in the cache
Request header has "no-cache", cache bypassed
Starting new HTTPS connection (1): files.pythonhosted.org:443
https://files.pythonhosted.org:443 "GET /packages/44/e1/dc0757b20b56c980b5553c1b5c4c32d378c7055ab7bfa92006801ad359ab/Keras-2.4.3-py2.py3-none-any.whl HTTP/1.1" 501 462
ERROR: Could not install packages due to an OSError.
Traceback (most recent call last):
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/commands/install.py", line 377, in run
requirement_set = resolver.resolve(
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve
result = self._result = resolver.resolve(
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_vendor/resolvelib/resolvers.py", line 397, in resolve
self._add_to_criteria(self.state.criteria, r, parent=None)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
if not criterion.candidates:
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_vendor/resolvelib/structs.py", line 156, in __bool__
return bool(self._sequence)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
return any(self)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
return (c for c in iterator if id(c) not in self._incompatible_ids)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
candidate = func()
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/factory.py", line 206, in _make_candidate_from_link
self._link_candidate_cache[link] = LinkCandidate(
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/candidates.py", line 293, in __init__
super().__init__(
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
self.dist = self._prepare()
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/candidates.py", line 225, in _prepare
dist = self._prepare_distribution()
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/resolution/resolvelib/candidates.py", line 304, in _prepare_distribution
return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/operations/prepare.py", line 532, in prepare_linked_requirement
metadata_dist = self._fetch_metadata_only(req)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/operations/prepare.py", line 385, in _fetch_metadata_only
) or self._fetch_metadata_using_lazy_wheel(req.link)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/operations/prepare.py", line 452, in _fetch_metadata_using_lazy_wheel
return dist_from_wheel_url(name, url, self._session)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/network/lazy_wheel.py", line 36, in dist_from_wheel_url
with LazyZipOverHTTP(url, session) as zf:
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/network/lazy_wheel.py", line 67, in __init__
tail = self._stream_response(start="", end=CONTENT_CHUNK_SIZE)
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_internal/network/lazy_wheel.py", line 210, in _stream_response
response.raise_for_status()
File "/home/cosmicexplorer/tools/pip/dist/pip-23.3.dev0-py3-none-any.whl/pip/_vendor/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
pip._vendor.requests.exceptions.HTTPError: 501 Server Error: Unsupported client range for url: https://files.pythonhosted.org/packages/44/e1/dc0757b20b56c980b5553c1b5c4c32d378c7055ab7bfa92006801ad359ab/Keras-2.4.3-py2.py3-none-any.whl
Apparently files.pythonhosted.org does not support Range: -10240 without
knowing the file's complete size:
> curl -L -H 'Range: bytes=-10240' 'https://files.pythonhosted.org/packages/44/e1/dc0757b20b56c980b5553c1b5c4c32d378c7055ab7bfa92006801ad359ab/Keras-2.4.3-py2.py3-none-any.whl'
<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html>
<head>
<title>501 Unsupported client range</title>
</head>
<body>
<h1>Error 501 Unsupported client range</h1>
<p>Unsupported client range</p>
<h3>Error 54113</h3>
<p>Details: cache-ewr18138-EWR 1691348077 3195631842</p>
<hr>
<p>Varnish cache server</p>
</body></html>
Is Range: bytes=-N also a feature specific to S3, or is that an HTTP
standard that Varnish should be respecting?
—
Reply to this email directly, view it on GitHub
<#11481 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABSZEUJNSLXBJG677TTFHDXT7S4LANCNFSM6AAAAAAQ3TJA4E>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@dholth It looks like |
Ah, thanks so much for clarifying this; that is indeed a shame, although for pypi itself we now have PEP 658 metadata being generated at |
Pypi used to support http range from end of file. IIUC the two requests minimum "with head request" defeats the speedup for high latency but my own Internet is low latency. Maybe after the redirect, fastly can accept negative range? Full range request can even support multiple ranges in one request! This is too fancy for me. S3 supports a maximum of one range per request. |
Thanks so much for that context! The merge conflicts on this PR are extremely trivial, so I'm going to try adding a fallback to |
This version of lazy_wheel is returning to pip after a detour in https://pypi.org/project/conda-package-streaming/
It avoids the
HEAD
request. Inconda-package-streaming
we also explicitly prefetch the file we are interested in (by finding its bytes range from ZipInfo) and guarantee 2 or a maximum of 3 requests.Reducing the # of requests makes a big difference when latency is high.