Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The fast-deps feature is not a fast way to obtain dependencies #8670

Open
McSinyx opened this issue Jul 31, 2020 · 20 comments · May be fixed by #12208
Open

The fast-deps feature is not a fast way to obtain dependencies #8670

McSinyx opened this issue Jul 31, 2020 · 20 comments · May be fixed by #12208
Assignees
Labels
state: needs discussion This needs some more discussion

Comments

@McSinyx
Copy link
Contributor

McSinyx commented Jul 31, 2020

Originally posted over Python discuss:

Despite its name, at the time of writing, for most of the cases (where the wheels are small) it does not make the pip install/pip download process any faster. The only case that it might be an optimization is where pip runs into a lot of dependencies conflicts and has to perform a series of backtracking.

Moreover, unlike the normal wheel download, the lazy implementation performs multiple requests. On unstable networks like mine, this makes it a lot slower than downloading the same size of data but in one single response (citation needed, I know this is generally believed but I'd love to read an article explain the reason in details). The first step to tackle this that I have in mind is to refuse to use range requests when a wheel is smaller than a certain size (a few times of chunk size perhaps?). There might need to be more experiments to further optimize this but this is the first thing I can think of. I'd really appreciate any suggestion, even just mere ideas of what to explore.

If possible, please assign this issue to me so I can better keep track of my GSoC project.

@triage-new-issues triage-new-issues bot added the S: needs triage Issues/PRs that need to be triaged label Jul 31, 2020
@McSinyx McSinyx changed the title The fast-deps feature may take longer to The fast-deps feature is not a fast way to obtain dependencies Jul 31, 2020
@McSinyx
Copy link
Contributor Author

McSinyx commented Aug 2, 2020

By applying this patch

diff --git a/src/pip/_internal/network/lazy_wheel.py b/src/pip/_internal/network/lazy_wheel.py
index c2371bf5..c9244bb5 100644
--- a/src/pip/_internal/network/lazy_wheel.py
+++ b/src/pip/_internal/network/lazy_wheel.py
@@ -60,6 +60,7 @@ class LazyZipOverHTTP(object):
 
     def __init__(self, url, session, chunk_size=CONTENT_CHUNK_SIZE):
         # type: (str, PipSession, int) -> None
+        self._count = 0
         head = session.head(url, headers=HEADERS)
         raise_for_status(head)
         assert head.status_code == 200
@@ -158,6 +159,7 @@ class LazyZipOverHTTP(object):
 
     def __exit__(self, *exc):
         # type: (*Any) -> Optional[bool]
+        print(self._count, 'requests to fetch metadata from', self._url[107:])
         return self._file.__exit__(*exc)
 
     @contextmanager
@@ -192,6 +194,7 @@ class LazyZipOverHTTP(object):
     def _stream_response(self, start, end, base_headers=HEADERS):
         # type: (int, int, Dict[str, str]) -> Response
         """Return HTTP response to a range request from start to end."""
+        self._count += 1
         headers = {'Range': 'bytes={}-{}'.format(start, end)}
         headers.update(base_headers)
         return self._session.get(self._url, headers=headers, stream=True)

I obtained the number of requests to fetch the metadata from each wheel:

$ pip install tensorflow --no-cache-dir | grep 'requests to fetch metadata from' | sort -k7
10 requests to fetch metadata from astunparse-1.6.3-py2.py3-none-any.whl
17 requests to fetch metadata from cachetools-4.1.1-py3-none-any.whl
1 requests to fetch metadata from gast-0.3.3-py2.py3-none-any.whl
4 requests to fetch metadata from google_auth-1.20.0-py2.py3-none-any.whl
1 requests to fetch metadata from google_auth_oauthlib-0.4.1-py2.py3-none-any.whl
4 requests to fetch metadata from google_pasta-0.2.0-py3-none-any.whl
13 requests to fetch metadata from grpcio-1.30.0-cp38-cp38-manylinux2010_x86_64.whl
14 requests to fetch metadata from h5py-2.10.0-cp38-cp38-manylinux1_x86_64.whl
16 requests to fetch metadata from Keras_Preprocessing-1.1.2-py2.py3-none-any.whl
4 requests to fetch metadata from Markdown-3.2.2-py3-none-any.whl
23 requests to fetch metadata from numpy-1.18.5-cp38-cp38-manylinux1_x86_64.whl
17 requests to fetch metadata from oauthlib-3.1.0-py2.py3-none-any.whl
10 requests to fetch metadata from opt_einsum-3.3.0-py3-none-any.whl
16 requests to fetch metadata from protobuf-3.12.4-cp38-cp38-manylinux1_x86_64.whl
7 requests to fetch metadata from pyasn1-0.4.8-py2.py3-none-any.whl
20 requests to fetch metadata from pyasn1_modules-0.2.8-py2.py3-none-any.whl
7 requests to fetch metadata from requests_oauthlib-1.3.0-py2.py3-none-any.whl
1 requests to fetch metadata from rsa-4.6-py3-none-any.whl
14 requests to fetch metadata from scipy-1.4.1-cp38-cp38-manylinux1_x86_64.whl
1 requests to fetch metadata from six-1.15.0-py2.py3-none-any.whl
20 requests to fetch metadata from tensorboard-2.3.0-py3-none-any.whl
17 requests to fetch metadata from tensorboard_plugin_wit-1.7.0-py3-none-any.whl
20 requests to fetch metadata from tensorflow-2.3.0-cp38-cp38-manylinux2010_x86_64.whl
14 requests to fetch metadata from tensorflow_estimator-2.3.0-py2.py3-none-any.whl
1 requests to fetch metadata from Werkzeug-1.0.1-py2.py3-none-any.whl
1 requests to fetch metadata from wheel-0.34.2-py2.py3-none-any.whl

I've yet to know what to do with this information, but it might help deciding the multiplier (to chunk size) to avoid using range requests.

Edit: attempt to download up to chunk size seems to be better:

diff --git a/src/pip/_internal/network/lazy_wheel.py b/src/pip/_internal/network/lazy_wheel.py
index c2371bf5..d5967057 100644
--- a/src/pip/_internal/network/lazy_wheel.py
+++ b/src/pip/_internal/network/lazy_wheel.py
@@ -60,6 +60,7 @@ class LazyZipOverHTTP(object):
 
     def __init__(self, url, session, chunk_size=CONTENT_CHUNK_SIZE):
         # type: (str, PipSession, int) -> None
+        self._count = 0
         head = session.head(url, headers=HEADERS)
         raise_for_status(head)
         assert head.status_code == 200
@@ -109,8 +110,10 @@ class LazyZipOverHTTP(object):
         all bytes until EOF are returned.  Fewer than
         size bytes may be returned if EOF is reached.
         """
+        download_size = max(size, self._chunk_size)
         start, length = self.tell(), self._length
-        stop = start + size if 0 <= size <= length-start else length
+        stop = length if size < 0 else min(start+download_size, length)
+        start = max(0, stop-download_size)
         self._download(start, stop-1)
         return self._file.read(size)
 
@@ -158,6 +161,7 @@ class LazyZipOverHTTP(object):
 
     def __exit__(self, *exc):
         # type: (*Any) -> Optional[bool]
+        print(self._count, 'requests to fetch metadata from', self._url[107:])
         return self._file.__exit__(*exc)
 
     @contextmanager
@@ -192,6 +196,7 @@ class LazyZipOverHTTP(object):
     def _stream_response(self, start, end, base_headers=HEADERS):
         # type: (int, int, Dict[str, str]) -> Response
         """Return HTTP response to a range request from start to end."""
+        self._count += 1
         headers = {'Range': 'bytes={}-{}'.format(start, end)}
         headers.update(base_headers)
         return self._session.get(self._url, headers=headers, stream=True)

pip install tensorflow --no-cache-dir | grep 'requests to fetch metadata from' | sort -k7
2 requests to fetch metadata from astunparse-1.6.3-py2.py3-none-any.whl
2 requests to fetch metadata from cachetools-4.1.1-py3-none-any.whl
1 requests to fetch metadata from gast-0.3.3-py2.py3-none-any.whl
3 requests to fetch metadata from google_auth-1.20.0-py2.py3-none-any.whl
2 requests to fetch metadata from google_auth_oauthlib-0.4.1-py2.py3-none-any.whl
2 requests to fetch metadata from google_pasta-0.2.0-py3-none-any.whl
2 requests to fetch metadata from grpcio-1.30.0-cp38-cp38-manylinux2010_x86_64.whl
6 requests to fetch metadata from h5py-2.10.0-cp38-cp38-manylinux1_x86_64.whl
2 requests to fetch metadata from Keras_Preprocessing-1.1.2-py2.py3-none-any.whl
2 requests to fetch metadata from Markdown-3.2.2-py3-none-any.whl
19 requests to fetch metadata from numpy-1.18.5-cp38-cp38-manylinux1_x86_64.whl
3 requests to fetch metadata from oauthlib-3.1.0-py2.py3-none-any.whl
2 requests to fetch metadata from opt_einsum-3.3.0-py3-none-any.whl
12 requests to fetch metadata from protobuf-3.12.4-cp38-cp38-manylinux1_x86_64.whl
2 requests to fetch metadata from pyasn1-0.4.8-py2.py3-none-any.whl
4 requests to fetch metadata from pyasn1_modules-0.2.8-py2.py3-none-any.whl
2 requests to fetch metadata from requests_oauthlib-1.3.0-py2.py3-none-any.whl
2 requests to fetch metadata from rsa-4.6-py3-none-any.whl
15 requests to fetch metadata from scipy-1.4.1-cp38-cp38-manylinux1_x86_64.whl
2 requests to fetch metadata from six-1.15.0-py2.py3-none-any.whl
16 requests to fetch metadata from tensorboard-2.3.0-py3-none-any.whl
2 requests to fetch metadata from tensorboard_plugin_wit-1.7.0-py3-none-any.whl
11 requests to fetch metadata from tensorflow-2.3.0-cp38-cp38-manylinux2010_x86_64.whl
5 requests to fetch metadata from tensorflow_estimator-2.3.0-py2.py3-none-any.whl
2 requests to fetch metadata from Werkzeug-1.0.1-py2.py3-none-any.whl
2 requests to fetch metadata from wheel-0.34.2-py2.py3-none-any.whl
With exit right after complete of resolution, the latter version only uses 20s while the current implementation would take 30s. I'm filing a PR for this.

$ git diff
diff --git a/src/pip/_internal/resolution/resolvelib/resolver.py b/src/pip/_internal/resolution/resolvelib/resolver.py
index 43ea2486..aad532df 100644
--- a/src/pip/_internal/resolution/resolvelib/resolver.py
+++ b/src/pip/_internal/resolution/resolvelib/resolver.py
@@ -125,6 +125,7 @@ class Resolver(BaseResolver):
             error = self.factory.get_installation_error(e)
             six.raise_from(error, e)
 
+        exit()
         req_set = RequirementSet(check_supported_wheels=check_supported_wheels)
         for candidate in self._result.mapping.values():
             ireq = candidate.get_install_requirement()

@McSinyx
Copy link
Contributor Author

McSinyx commented Aug 3, 2020

Noting JSON (redesigned) and simple (pypi/warehouse#8254) API may provide a more efficient and proformance-wise deterministic way to fetch metadata.

@ofek
Copy link
Contributor

ofek commented Aug 13, 2020

Closed by #8681?

@pradyunsg
Copy link
Member

Should be!

@McSinyx
Copy link
Contributor Author

McSinyx commented Aug 13, 2020

I'm not exactly sure about closing this: whilst GH-8681 does make fast-deps faster, I don't have any benchmark to prove (at least to myself) that it's always/mostly faster than downloading the whole wheels, especially when most pure Python packages are rather small. I'd prefer to keep this one open and close it myself when I'm more certain about the performance.

@pradyunsg pradyunsg reopened this Aug 30, 2020
@pradyunsg
Copy link
Member

I'd prefer to keep this one open and close it myself when I'm more certain about the performance.

Here ya go. :)

@McSinyx
Copy link
Contributor Author

McSinyx commented Aug 30, 2020

Yay! I'm running some benchmarks lately and funny enough dependency resolution with fast-deps is not much faster than without, making the overall process (even with parallel download) much slower 😞

@pradyunsg
Copy link
Member

pradyunsg commented Aug 30, 2020

How does it perform in cases w/ backtracking though? Try pip install pyrax==1.9.8 --log log.txt (and waiting for a long time) -- also, consider sharing the log.txt here once that is done. :)

@pradyunsg
Copy link
Member

the overall process (even with parallel download) much slower 😞

How much slower?

@McSinyx
Copy link
Contributor Author

McSinyx commented Aug 30, 2020

Please checkout the benchmark here. I couldn't get pyrax==1.9.8 to finish (it got stuck after downloading PyYAML 3.12, with or without fast-deps) though.

@dholth
Copy link
Member

dholth commented Apr 20, 2022

@McSinyx are you still interested in this?

I modified lazy-wheel in pip 22.0.3 to print range requests. For a particular zip I wanted one file but got a suspicious number of range requests.

fetch range {'Accept-Encoding': 'identity', 'Range': 'bytes=184320-190434'}
fetch range {'Accept-Encoding': 'identity', 'Range': 'bytes=180195-184319'}
fetch range {'Accept-Encoding': 'identity', 'Range': 'bytes=74-10313'}
fetch range {'Accept-Encoding': 'identity', 'Range': 'bytes=10314-10343'}
fetch range {'Accept-Encoding': 'identity', 'Range': 'bytes=10344-10392'}

In lazy_wheel.py read has a max() preventing it from downloading more than CHUNK_SIZE at once. It seems like CHUNK_SIZE should be the minimum size?

https://github.com/pypa/pip/blob/main/src/pip/_internal/network/lazy_wheel.py#L95

I changed it to min

    def read(self, size: int = -1) -> bytes:
        """Read up to size bytes from the object and return them.

        As a convenience, if size is unspecified or -1,
        all bytes until EOF are returned.  Fewer than
        size bytes may be returned if EOF is reached.
        """
        download_size = min(size, self._chunk_size) # was max()
        start, length = self.tell(), self._length
        stop = length if size < 0 else min(start + download_size, length)
        start = max(0, stop - download_size)
        self._download(start, stop - 1)
        return self._file.read(size)

This gets us down to four range requests:

fetch range {'Accept-Encoding': 'identity', 'Range': 'bytes=184320-190434'}
fetch range {'Accept-Encoding': 'identity', 'Range': 'bytes=74-103'}
fetch range {'Accept-Encoding': 'identity', 'Range': 'bytes=104-152'}
fetch range {'Accept-Encoding': 'identity', 'Range': 'bytes=153-8117'}

As a further optimization, I read all bytes between the header offsets of the file I want, and the next header offset. Instead of letting ZipFile make a small read for the file header and a second read for the file contents. This is even possible with a second ZipFile object on the same LazyZipOverHTTP. Note this code is broken if the file we want is the last file:

info, after = next(
    (inf, n)
    for (inf, n) in zip(zf.infolist(), zf.infolist()[1:])
    if inf.filename.startswith("info-")
)
zf.fp.seek(info.header_offset)
zf.fp.read(
    after.header_offset - info.header_offset
)

Gets us down to the optimal two requests. (We should emit up to two Range requests for the footer, depending on whether the footer is smaller than the chunk size or not, and then we can emit one per compressed file).

fetch range {'Accept-Encoding': 'identity', 'Range': 'bytes=184320-190434'}
fetch range {'Accept-Encoding': 'identity', 'Range': 'bytes=74-8117'}

@dholth
Copy link
Member

dholth commented Apr 20, 2022

The feature also makes an unnecessary HEAD request, when it could get the total Content-Length from the first Range request, preemptively fetching a certain number of bytes from the end of the file.

@dholth
Copy link
Member

dholth commented Apr 21, 2022

Unfortunately download_size = min(size, self._chunk_size) # was max() breaks the download for some reason (it differs from the original file)

@McSinyx
Copy link
Contributor Author

McSinyx commented Apr 21, 2022

I think that checks out because at least size needs to be downloaded for reading and min gives chunk size in case it is smaller. That does not explain how no byte within 8118-10392 or 180195-184319 is asked for after patching though. I am rather curious what segments are read.

@dholth
Copy link
Member

dholth commented Apr 21, 2022

I'm re-using this code to look at .conda zip files. The zip directory is at the end and the metadata I want is a single file near the beginning of the file. So it is expected that no other data is fetched.

It looks like this code limits Range requests to no more than CHUNK_SIZE bytes? Since metadata tends to be small it doesn't usually matter but it would be nice if there was no upper limit.

@dholth
Copy link
Member

dholth commented Apr 21, 2022

Code to eliminate the HEAD request

headers["Range"] = f"bytes=-{CONTENT_CHUNK_SIZE}"

# if CONTENT_CHUNK_SIZE is bigger than the file:
# In [8]: response.headers["Content-Range"]
# Out[8]: 'bytes 0-3133374/3133375'

tail = session.get(url, headers=headers, stream=True)
# e.g. {'accept-ranges': 'bytes', 'content-length': '10240',
# 'content-range': 'bytes 12824-23063/23064', 'last-modified': 'Sat, 16
# Apr 2022 13:03:02 GMT', 'date': 'Thu, 21 Apr 2022 11:34:04 GMT'}

if tail.status_code != 206:
    raise HTTPRangeRequestUnsupported("range request is not supported")

self._session, self._url, self._chunk_size = session, url, chunk_size
self._length = int(tail.headers["Content-Range"].partition("/")[-1])
self._file = NamedTemporaryFile()
self.truncate(self._length)

# length is also in Content-Length and Content-Range header
with self._stay():
    self.seek(self._length - len(tail.content))
    self._file.write(tail.content)
self._left: List[int] = [self._length - len(tail.content)]
self._right: List[int] = [self._length - 1]

@dholth
Copy link
Member

dholth commented Apr 21, 2022

CONTENT_CHUNK_SIZE comes from requests, it happens to be 10k, which is a great guess for fetching wheel .zip directories in a single request.

@pradyunsg pradyunsg added state: needs discussion This needs some more discussion and removed S: needs triage Issues/PRs that need to be triaged labels Jun 21, 2022
@dholth
Copy link
Member

dholth commented Oct 3, 2022

Lazier wheel is here. It should be compatible but I haven't tested it yet. It avoids the HEAD request.

If you find that the last 10k of the file isn't a good enough heuristic, you can also lookup the content range for the desired METADATA file, in the zip index, and guarantee 2 or 3 requests maximum.

#11447 #11481

@dholth
Copy link
Member

dholth commented Oct 4, 2022

I had a chance to test it. I had to handle 416 errors for wheels that were smaller than the chunk size, and I added eager fetch of the entire .dist-info section of the wheel (but we could fetch only METADATA and WHEEL; this would help if they are in a single convenient range).

This reduces the # of requests.

% python -m pip install --no-cache-dir tensorflow --use-feature=fast-deps --dry-run
WARNING: pip is using lazily downloaded wheels using HTTP range requests to obtain dependency information. This experimental feature is enabled through --use-feature=fast-deps and it is not ready for production.
Collecting tensorflow
  Obtaining dependency information from tensorflow 2.10.0
prefetch dist-info 239935539-240305560
3 requests to fetch metadata from tensorflow-2.10.0-cp39-cp39-macosx_10_14_x86_64.whl
Collecting google-pasta>=0.1.1
  Obtaining dependency information from google-pasta 0.2.0
prefetch dist-info 49328-55329
1 requests to fetch metadata from google_pasta-0.2.0-py3-none-any.whl
Collecting astunparse>=1.6.0
  Obtaining dependency information from astunparse 1.6.3
prefetch dist-info 7309-11951
1 requests to fetch metadata from astunparse-1.6.3-py2.py3-none-any.whl
Collecting wrapt>=1.11.0
  Obtaining dependency information from wrapt 1.14.1
prefetch dist-info 30540-34428
1 requests to fetch metadata from wrapt-1.14.1-cp39-cp39-macosx_10_9_x86_64.whl
Collecting keras-preprocessing>=1.1.1
  Obtaining dependency information from keras-preprocessing 1.1.2
prefetch dist-info 38292-41164
1 requests to fetch metadata from Keras_Preprocessing-1.1.2-py2.py3-none-any.whl
Collecting keras<2.11,>=2.10.0
  Obtaining dependency information from keras 2.10.0
prefetch dist-info 1606097-1634571
3 requests to fetch metadata from keras-2.10.0-py2.py3-none-any.whl
Collecting tensorboard<2.11,>=2.10
  Obtaining dependency information from tensorboard 2.10.1
prefetch dist-info 5827781-5847436
3 requests to fetch metadata from tensorboard-2.10.1-py3-none-any.whl
Collecting packaging
  Obtaining dependency information from packaging 21.3
prefetch dist-info 28518-39361
2 requests to fetch metadata from packaging-21.3-py3-none-any.whl
Collecting gast<=0.4.0,>=0.2.1
  Obtaining dependency information from gast 0.4.0
prefetch dist-info 6988-9134
2 requests to fetch metadata from gast-0.4.0-py3-none-any.whl
Collecting absl-py>=1.0.0
  Obtaining dependency information from absl-py 1.2.0
prefetch dist-info 114391-121307
1 requests to fetch metadata from absl_py-1.2.0-py3-none-any.whl
Collecting grpcio<2.0,>=1.24.3
  Obtaining dependency information from grpcio 1.49.1
prefetch dist-info 4524206-4538044
2 requests to fetch metadata from grpcio-1.49.1-cp39-cp39-macosx_10_10_x86_64.whl
Collecting six>=1.12.0
  Obtaining dependency information from six 1.16.0
prefetch dist-info 8485-10605
1 requests to fetch metadata from six-1.16.0-py2.py3-none-any.whl
Collecting tensorflow-io-gcs-filesystem>=0.23.1
  Obtaining dependency information from tensorflow-io-gcs-filesystem 0.27.0
prefetch dist-info 1629750-1639868
2 requests to fetch metadata from tensorflow_io_gcs_filesystem-0.27.0-cp39-cp39-macosx_10_14_x86_64.whl
Collecting typing-extensions>=3.6.6
  Obtaining dependency information from typing-extensions 4.3.0
prefetch dist-info 18021-25162
1 requests to fetch metadata from typing_extensions-4.3.0-py3-none-any.whl
Collecting termcolor>=1.1.0
  Obtaining dependency information from termcolor 2.0.1
prefetch dist-info 2196-4877
2 requests to fetch metadata from termcolor-2.0.1-py3-none-any.whl
Collecting numpy>=1.20
  Obtaining dependency information from numpy 1.23.3
prefetch dist-info 0-18055764
3 requests to fetch metadata from numpy-1.23.3-cp39-cp39-macosx_10_9_x86_64.whl
Collecting h5py>=2.9.0
  Obtaining dependency information from h5py 3.7.0
prefetch dist-info 3175427-3181569
2 requests to fetch metadata from h5py-3.7.0-cp39-cp39-macosx_10_9_x86_64.whl
Collecting tensorflow-estimator<2.11,>=2.10.0
  Obtaining dependency information from tensorflow-estimator 2.10.0
prefetch dist-info 420686-426115
3 requests to fetch metadata from tensorflow_estimator-2.10.0-py2.py3-none-any.whl
Requirement already satisfied: setuptools in /Users/dholth/opt/py3x86/lib/python3.9/site-packages (from tensorflow) (58.1.0)
Collecting protobuf<3.20,>=3.9.2
  Obtaining dependency information from protobuf 3.19.6
prefetch dist-info 973147-976250
1 requests to fetch metadata from protobuf-3.19.6-cp39-cp39-macosx_10_9_x86_64.whl
Collecting libclang>=13.0.0
  Obtaining dependency information from libclang 14.0.6
prefetch dist-info 13223909-13231552
1 requests to fetch metadata from libclang-14.0.6-py2.py3-none-macosx_10_9_x86_64.whl
Collecting opt-einsum>=2.3.2
  Obtaining dependency information from opt-einsum 3.3.0
prefetch dist-info 58104-63148
1 requests to fetch metadata from opt_einsum-3.3.0-py3-none-any.whl
Collecting flatbuffers>=2.0
  Obtaining dependency information from flatbuffers 22.9.24
prefetch dist-info 24140-25598
1 requests to fetch metadata from flatbuffers-22.9.24-py2.py3-none-any.whl
Collecting wheel<1.0,>=0.23.0
  Obtaining dependency information from wheel 0.37.1
prefetch dist-info 30516-33723
1 requests to fetch metadata from wheel-0.37.1-py2.py3-none-any.whl
Collecting markdown>=2.6.8
  Obtaining dependency information from markdown 3.4.1
prefetch dist-info 85305-90374
1 requests to fetch metadata from Markdown-3.4.1-py3-none-any.whl
Collecting google-auth-oauthlib<0.5,>=0.4.1
  Obtaining dependency information from google-auth-oauthlib 0.4.6
prefetch dist-info 11092-17255
1 requests to fetch metadata from google_auth_oauthlib-0.4.6-py2.py3-none-any.whl
Collecting tensorboard-plugin-wit>=1.6.0
  Obtaining dependency information from tensorboard-plugin-wit 1.8.1
prefetch dist-info 773914-776703
1 requests to fetch metadata from tensorboard_plugin_wit-1.8.1-py3-none-any.whl
Collecting google-auth<3,>=1.6.3
  Obtaining dependency information from google-auth 2.12.0
prefetch dist-info 156009-164848
2 requests to fetch metadata from google_auth-2.12.0-py2.py3-none-any.whl
Collecting requests<3,>=2.21.0
  Obtaining dependency information from requests 2.28.1
prefetch dist-info 54220-61243
1 requests to fetch metadata from requests-2.28.1-py3-none-any.whl
Collecting werkzeug>=1.0.1
  Obtaining dependency information from werkzeug 2.2.2
prefetch dist-info 223166-228709
1 requests to fetch metadata from Werkzeug-2.2.2-py3-none-any.whl
Collecting tensorboard-data-server<0.7.0,>=0.6.0
  Obtaining dependency information from tensorboard-data-server 0.6.1
prefetch dist-info 3544582-3545791
1 requests to fetch metadata from tensorboard_data_server-0.6.1-py3-none-macosx_10_9_x86_64.whl
Collecting pyparsing!=3.0.5,>=2.0.2
  Obtaining dependency information from pyparsing 3.0.9
prefetch dist-info 93808-97206
1 requests to fetch metadata from pyparsing-3.0.9-py3-none-any.whl
Collecting cachetools<6.0,>=2.0.0
  Obtaining dependency information from cachetools 5.2.0
prefetch dist-info 5427-8670
2 requests to fetch metadata from cachetools-5.2.0-py3-none-any.whl
Collecting pyasn1-modules>=0.2.1
  Obtaining dependency information from pyasn1-modules 0.2.8
prefetch dist-info 140434-147136
2 requests to fetch metadata from pyasn1_modules-0.2.8-py2.py3-none-any.whl
Collecting rsa<5,>=3.1.4
  Obtaining dependency information from rsa 4.9
prefetch dist-info 29553-33054
1 requests to fetch metadata from rsa-4.9-py3-none-any.whl
Collecting requests-oauthlib>=0.7.0
  Obtaining dependency information from requests-oauthlib 1.3.1
prefetch dist-info 16437-22125
1 requests to fetch metadata from requests_oauthlib-1.3.1-py2.py3-none-any.whl
Collecting importlib-metadata>=4.4
  Obtaining dependency information from importlib-metadata 5.0.0
prefetch dist-info 13730-20583
1 requests to fetch metadata from importlib_metadata-5.0.0-py3-none-any.whl
Collecting urllib3<1.27,>=1.21.1
  Obtaining dependency information from urllib3 1.26.12
prefetch dist-info 117715-137216
2 requests to fetch metadata from urllib3-1.26.12-py2.py3-none-any.whl
Collecting charset-normalizer<3,>=2
  Obtaining dependency information from charset-normalizer 2.1.1
prefetch dist-info 31240-38208
1 requests to fetch metadata from charset_normalizer-2.1.1-py3-none-any.whl
Collecting idna<4,>=2.5
  Obtaining dependency information from idna 3.4
prefetch dist-info 55146-60675
1 requests to fetch metadata from idna-3.4-py3-none-any.whl
Collecting certifi>=2017.4.17
  Obtaining dependency information from certifi 2022.9.24
prefetch dist-info 157623-160372
1 requests to fetch metadata from certifi-2022.9.24-py3-none-any.whl
Collecting MarkupSafe>=2.1.1
  Obtaining dependency information from markupsafe 2.1.1
prefetch dist-info 9668-12762
1 requests to fetch metadata from MarkupSafe-2.1.1-cp39-cp39-macosx_10_9_x86_64.whl
Collecting zipp>=0.5
  Obtaining dependency information from zipp 3.8.1
prefetch dist-info 2551-5196
2 requests to fetch metadata from zipp-3.8.1-py3-none-any.whl
Collecting pyasn1<0.5.0,>=0.4.6
  Obtaining dependency information from pyasn1 0.4.8
prefetch dist-info 70813-74145
1 requests to fetch metadata from pyasn1-0.4.8-py2.py3-none-any.whl
Collecting oauthlib>=3.0.0
  Obtaining dependency information from oauthlib 3.2.1
prefetch dist-info 137993-145152
2 requests to fetch metadata from oauthlib-3.2.1-py3-none-any.whl

@dholth
Copy link
Member

dholth commented Oct 4, 2022

I added in the exit() call right before req_set = RequirementSet(check_supported_wheels=check_supported_wheels), to time dependency resolution only.

With minimal requests (prefetch): python -m pip install --no-cache-dir tensorflow --use-feature=fast-deps 3.91s user 0.34s system 50% cpu 8.480 total
With more requests (no prefetch): python -m pip install --no-cache-dir tensorflow --use-feature=fast-deps 4.61s user 0.33s system 41% cpu 12.007 total

% ping files.pythonhosted.org
PING dualstack.r.ssl.global.fastly.net (151.101.1.63): 56 data bytes
64 bytes from 151.101.1.63: icmp_seq=0 ttl=59 time=28.996 ms
64 bytes from 151.101.1.63: icmp_seq=1 ttl=59 time=21.039 ms
64 bytes from 151.101.1.63: icmp_seq=2 ttl=59 time=21.088 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: needs discussion This needs some more discussion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants