-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lazy-wheel causes poetry to use all the RAM #8929
Comments
Maybe there is some kind of size limit we could introduce? This is for sure a release blocker. We need to either fix this or revert that feature. |
Is it only I just noticed that Poetry has to download all wheels during locking despite of lazy-wheel because the source does not provide hashes but I cannot observe that the memory usage increases during locking so far. |
OK, now it happened. |
I think the issue is
|
We probably just have to make sure that poetry/src/poetry/repositories/http_repository.py Lines 134 to 136 in 776fcb8
if we hit this branch earlier in the same function call: poetry/src/poetry/repositories/http_repository.py Lines 124 to 128 in 776fcb8
to break the recursion. It seems like we get 304 (not modified) at some point when trying range requests in this example. I'm wondering if we can avoid this with some HTTP header? |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
ok, I think the confusing server behaviour is actually a bug in cachecontrol
so this unconditionally loads the cached response, and sees a Then we end up using the cached response anyway, which was for the whole wheel and not for the range request at all |
psf/cachecontrol#327, though I imagine if poetry cares to do anything about this then someone will need to contribute. alternatively perhaps should find a way of avoiding cachecontrol altogether in the lazywheel parts, not sure if that's more or less painful (Also I wonder whether this is what was happening all along when you saw
perhaps it was cachecontrol confusing things the whole time?) |
I wonder if we want to use range requests to fetch metadata even if we have to download the whole wheel for calculating the hash or just use the cached response. We trade a huge local copy operation (cf #7916 which was reverted) for some small network requests (at least in this example). I tend to say we can use the cached response. |
as #7916 accidentally proved, using the cached response requires either cooperation with cachecontrol - or undesirable knowledge of how cachecontrol works. #7961 (comment) by the time you have made that work you might almost have done the second ("hard") fix proposed on the cachecontrol issue: in principle at least, cachecontrol could respond to partial content requests if it already has the entire content the "easy" proposed fix at cachecontrol does look easy. I do not expect to get to it for a day or few, if you - or any other - wants to offer them an MR then please go ahead. Obvs we have no control over how quickly they would then make a release, but I would be reasonably optimistic about that. even though it turns out this particular server is not behaving with flip-flop "I support range requests, oops no I dont", breaking the recursion per #8929 (comment) is clearly desirable. |
actually I again wonder whether any server actually behaves in the complicated ways that the lazy-wheel implementation suggests (eg accepts range requests at some urls but not others), or whether this is all an illusion created by cachecontrol confusion perhaps there is simplification to be made. |
@radoering do you think this should block the release until fixed or should we revert this commit and move that to the next release, fixing the issue in the meantime? I would go with the revert personally. |
when it works, and in the right circumstances, the lazy-wheel stuff is a massive win. eg #6409 (comment) so it would be a shame if it didn't make it into a release soon: though if you're in a hurry to get something out then there can always be two releases with only a modest gap... |
I probably did not make it clear enough. I did not mean to explicity use the cached response (as #7916 tried and failed) but implicitly: If a link does not has a hash, we know that we will download / have downloaded the wheel anyway so we can just not use range requests but the download function, which will implicitly use the cached response. Of course, it will not use it directly but do a copy operation first so in theory it could still be slower than range requests - but that's exactly what we have done before lazy-wheel. I think I will do some measurements to check if there is a significant difference.
That's something I observed while doing the first steps with curl (so no cachecontrol involved) - so yes it's real. I can even give some details: It's a devpi-server with nginx (IIUC devpi-server does not support range requests on its own) and especially a PyPI mirror. The server supports range requests for wheels that have already been downloaded before but it does not support range requests for wheels that have not been requested before. @Secrus It's not like this is the only open blocker for the release. There is still some work to do after the poetry-core release. Further, IMO this feature will be one of the highlights of the next version. (I think it's currently far more relevant than PEP 658 support for real use cases.) Thus, I don't want to revert it if I can fix it within the next days. I'm quite confident to provide a fix (even without a cachecontrol release) to make it robust against this cachecontrol bug. |
For this example (wheel with size of about 100 MB), the copy operation of the cached wheel seems to be a little bit faster than range requests. Copy is probably better for smaller wheels and range requests might be better for larger wheels, but I don't think it really matters here.
Actually, we lose that information at the moment because we only pass the URL instead of the link. However, this will change with the refactoring done in #5509 anyway. |
cachecontrol 0.14 is released with the fix, so I think the only fixes that really need making before release are
|
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
if we do not disable lazy wheel then trying to
poetry install
this (with an empty cache) causes poetry to use all the RAM until the OS kills itThe text was updated successfully, but these errors were encountered: