Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed-up by early pruning based on markers #4952

Open
2 tasks done
maciejskorski opened this issue Dec 29, 2021 · 5 comments
Open
2 tasks done

speed-up by early pruning based on markers #4952

maciejskorski opened this issue Dec 29, 2021 · 5 comments
Labels
kind/feature Feature requests/implementations status/triage This issue needs to be triaged

Comments

@maciejskorski
Copy link

maciejskorski commented Dec 29, 2021

  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have searched the documentation and believe that my question is not covered.

Feature Request

As of now poetry spends a lot of time on overrides which doesn't match the environmental version or platform requirements.
They are filtered in the very end of the process, after writing possible scenarios to the lock file. It would be great to have early pruning of such overrides, particularly as more and more packages go for conditional logic in their requriements.
The issue fits the broader call for better support of markers, such as #4670

Consider this example run within a virtual environment with Python 3.8

[tool.poetry.dependencies]
python = "~3.8"
pandas = [
    {version="1.3.*",markers="python_version=='3.9.*'"},
    {version="1.2.*",markers="python_version=='3.7.*'"},
    {version="1.1.*",markers="python_version=='3.10.*'"},
]
....

Petry correctly concludes the empty set of matching packages, however it splits the graph in 6 branches (due to explicit version split and one more implicit in pandas numpy dependency) - and this escalates the more markers we have. It would help to compare the markers early and ignore some of the branches.

NOTE: this is a broader call for being able to support restricted target environments and narrow the lock to prescribed architectures, python versions and so on.

@maciejskorski maciejskorski added kind/feature Feature requests/implementations status/triage This issue needs to be triaged labels Dec 29, 2021
@maciejskorski
Copy link
Author

maciejskorski commented Dec 30, 2021

Here are my further thoughts once I got familiar with the code:

Install configs, written to the .lock file are platform agnostic, e.g. cover all possible scenarios. This exponentially impacts the running time in some scenarios (combine pandas, black, and azureml in its modern releases).

Adding pre-filtering to match the target markers seems easy, but it's going to change the way log file looks like so much more work on the tests side - as they test not only the picked up instructions but also the content file (hash included)

@radoering
Copy link
Member

The example is only a special case of the issue, isn't it? Actually, in the example all of the pandas constraints should be ignored because they don't match the project's python constraint. (I would almost consider it a bug that this currently is not the case.) If the python constraint would be "^3.7" instead of "~3.8", it would be correct to consider all pandas overrides when locking (no matter in which environment).

I think, in the example you can even force the desired behaviour by python instead of markers:

[tool.poetry.dependencies]
python = "~3.8"
pandas = [
    {version="1.3.*",python="3.9.*"},
    {version="1.2.*",python="3.7.*"},
    {version="1.1.*",python="3.10.*"},
]

When using this syntax, the pandas constraints are ignored...

In my opinion, creating one lock file for all platforms and python versions on an arbitrary platform with one of the supported python versions (together with multiple constraint dependencies in the pyproject.toml) is one of the most important features of poetry. Further, once the lock file has been created, installation does not require locking again.

@maciejskorski maciejskorski changed the title early prune based on markers speed-up by early pruning based on markers Dec 31, 2021
@maciejskorski
Copy link
Author

maciejskorski commented Jan 1, 2022

The example is only a special case of the issue, isn't it?

Yes, of course. We want to pre-filter more important markers such as OS or architecture. They often appear in modern ML stack (recent releases of pandas, azure, black...). Sorry if that was not clear, I can bring up a more devastating example:-)

I think, in the example you can even force the desired behaviour by python instead of markers:

Not really, the behaviour of python constraints is known to have bugs when markers appear. I really want to support markers.
Like I said, markers appear in data-science or ML stack and supporting them is important.

In my opinion, creating one lock file for all platforms and python versions on an arbitrary platform with one of the supported python versions (together with multiple constraint dependencies in the pyproject.toml) is one of the most important features of poetry. Further, once the lock file has been created, installation does not require locking again.

That's the case of mature packages. I want to support coders which develop internal dependencies and debug intensively, within a team, switching and updating dependencies often. Then updating or recreating happens often and is really time-consuming.

Yes this is great, that the lock is system agnostic:-)
However, this makes poetry slow due to exponentially many possibilities when building bigger packages.
Use-cases are in between: say we develop for restricted platforms or restricted architectures, restricted python versions or other subset of markers. This is a call for constraining the target environment for valid reasons, could be done in the toml file as in the PR #4956. Of course, as an option. Let other enjoy the poetry being environment agnostic :-)

@hungbie
Copy link

hungbie commented Jun 2, 2022

+1 for this. We have a big project in our team, each time poetry solves the dependencies it already takes more than 10 minutes (each try). However, we know beforehand the version we want as well as the platform, so retries to have a solution that valid everywhere is redundant and a big waste of resources. Currently in our project we have to retries around 20 times, making the whole process super slow, sometimes can be upto 4 hours.

@jfyu
Copy link

jfyu commented Aug 16, 2022

+1 for this as well. pandas and numpy are ubiquitous to data science and this issue is slowing down our microservice project build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Feature requests/implementations status/triage This issue needs to be triaged
Projects
None yet
Development

No branches or pull requests

4 participants