Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial metadata gathering from legacy sources based on markers/constraints #6634

Closed
4 tasks done
Daniel451 opened this issue Sep 26, 2022 · 6 comments
Closed
4 tasks done
Labels
area/sources Releated to package sources/indexes/repositories kind/feature Feature requests/implementations

Comments

@Daniel451
Copy link

Daniel451 commented Sep 26, 2022

  • Poetry version: 1.2.0
  • Python version: 3.10
  • OS version and name: Manjaro (linux)
  • I am on the latest stable Poetry version, installed using a recommended method.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have consulted the FAQ and blog for any relevant entries or release notes.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option) and have included the output below.

Issue

My dependency configuration looks like this:

[tool.poetry.dependencies]                                     
python = "~3.10"                                               
numpy = { version = "^1.23.2", source = "pypi" }               
torch = {version = "1.12.1", source = "torch", python="~3.10", markers="python_version~='3.10' and sys_platform=='linux'"}
torchvision = { version = "*", source = "torch", python="~3.10"}                                                                                                                        
                                                               
[[tool.poetry.source]]                                         
name = "torch"                                                 
url = "https://download.pytorch.org/whl/cu116"                 
default = false                                                
secondary = true

...yet poetry lock attempts to download older versions and other platforms, too. Like:

Resolving dependencies... Downloading https://download.pytorch.org/whl/cu116/torch-1.12.1%2Bcu116-cp38-cp38-win_amd64.whl
Resolving dependencies... Downloading https://download.pytorch.org/whl/cu116/torch-1.12.1%2Bcu116-cp39-cp39-linux_x86_64.whl

I tried this with and without marker definition, with and without python= definition within the package definition for torch and torchvision but the result was always the same.

This seems to be related to #4958 for me but in this thread the PR was merged and it should work as expected, as far as I understand.

@Daniel451 Daniel451 added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Sep 26, 2022
@neersighted neersighted added kind/feature Feature requests/implementations area/sources Releated to package sources/indexes/repositories and removed kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Sep 26, 2022
@neersighted neersighted changed the title Poetry ignores Python version definition when loading PyTorch Partial metadata gathering from legacy sources based on markers/constraints Sep 26, 2022
@neersighted
Copy link
Member

neersighted commented Sep 26, 2022

Poetry will download every distfile from a legacy package source in order to gather metadata -- even if that package will not necessarily be installed due to markers. This is related to the nature of the solver and of the PEP 508 API -- essentially, Poetry will not currently exclude metadata (like hashes) based on markers as packages are gathered before markers are even considered, and releases in the poetry.lock file are meant to be complete and represent the release as stored in the index.

I'm going to ping @dimbleby and @radoering to get their opinion on whether pushing markers to the repository stage is viable -- however, I don't think it is trivial.

Looking closely at your constraints, you are trying to do what #4956 proposes to implement. If/when that is finished, it will be a better path for you in the long term. It's also worth nothing that PEP 658 and PEP 691, when implemented together by third-party indexes, will solve this by removing the need to download files to gather metadata/hashes, letting us be as performant with third-party sources as PyPI.

@Daniel451
Copy link
Author

@neersighted thank you very much for the explanation, that clears it up why it is failing. Also thanks for highlighting #4956. I've subscribed to this one and gave thumbs up :)

From PEP 658

Package management workflows made popular by recent tooling increase the need to inspect distribution metadata without intending to install the distribution, and download multiple distributions of a project to choose from based on their metadata. This means they end up discarding much downloaded data, which is inefficient and results in a bad user experience.

If I understand this correctly, the server-side would also need to adhere to this, right? So if PyTorch does not expose the metadata by adding the anchor tag according to PEP 658 Poetry (and any other tool) would still have to download the whole thing just for extracting the metadata afterwards? Sounds to me like PyTorch is the next address to create GitHub issues.

@neersighted
Copy link
Member

That is correct, though it is worth noting that Poetry has not grown PEP 658 support yet as there are no real-world implementors. PyPI gaining support would likely be the impetus for Poetry to support this as we would have a widely used and battle-tested version to validate Poetry against.

That being said, nothing stops you from creating fixtures based on the current PyPI fixtures (as a sort of speculation based on the spec as to what the 'real' implementation will look like) and implementing/testing PEP 658 in Poetry 'ahead of time' 😄

Also yes, Poetry is directly being referred to in that paragraph, though I think PDM has similar design-driven requirements for metadata.

@dimbleby
Copy link
Contributor

I'd think it's a plausible optimization opportunity to ignore distributions that can never match the project-wide constraints. (Currently the only project-wide constraint is the python version, but per that MR it's possible that there will be more in future.)

Trying to be more precise than that feels likely to be more complicated than it's worth - most repositories serve up hash information directly, so this is quite the unusual case.

The particular use case here where the user cares for only one python version and one platform is likely better served today by a direct URL package rather than by using a secondary source.

@neersighted
Copy link
Member

Yeah, when I ask for input, I'm referring to this specific case (the linked PR already proves the viability of a project-wide reduced search space). I'm going to close this for now, as you are right, a URL dependency solves this problem today acceptably.

@neersighted neersighted closed this as not planned Won't fix, can't repro, duplicate, stale Sep 27, 2022
Copy link

github-actions bot commented Mar 1, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/sources Releated to package sources/indexes/repositories kind/feature Feature requests/implementations
Projects
None yet
Development

No branches or pull requests

3 participants