-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new API in resolvelib to Optimize Pip's Dependency Resolution #12497
Comments
Two points here.
|
Pip is the main driver of the resolvelib. And this tagently related issue (different problem, possibly same solution), has been waiting on feedback from Pip maintainers: sarugaku/resolvelib#134. So I don't see how this would be any different.
I would gladly take another suggestion on a name, it's simply a reference to the existing variable Happy to fill out the docstring to explain that in more detail. |
I added a comment on that issue.
I have to say, I’m struggling to understand why this is so hard to get across. The whole point of having a separate library for the resolver is so that pip only needs to interact with a clear, documented API. I’m not being obtuse here -I genuinely do not know what it means for a name to be “unsatisfied”.
So, what? A provider implementation’s The most critical thing I want to understand is what the risk is, if a future pip maintainer makes a change that impacts this function (either directly or indirectly) and gets it wrong. And following on from that, what information is available to minimise the risk that they get it wrong in the first place. |
Appreciate it!
I'm struggling to understand what you don't understand sorry, this is the same term used in the new resolver since it was made default in late 2020, and means the same thing, I picked that name because it was already a known term in the resolver codebase. As I said, I'm more than happy if someone thinks there is a more descriptive name, I am not a fan of it, but I thought anyone who had to deal directly would resolution steps would at least understand it.
It doesn't need to be a copy, the provider can just return the list straight back and they get the same current behavior. I don't know what you mean by "How?", any way the provider wants to filter, or not filter, that list.
That the next step of the backtracking phase will only backtrack on those packages the client returned in the list.
Then the next step of the backtracking phase will only focus on that first name. In real world performance for Pip it would affect the performance like so:
Yup, if that's how the client wants to focus the resolve that is completely valid. Honestly there may well be real world use cases for this approach, something about the client wanting to force resolution in the exact order they provided it without resolvelib checking get_preference on each name,
As long as the client returns a non-empty subset of the name list (including upto the list itself), and as long as there are no bugs in resolvelib, and ignoring arbitary limits like
What's new is if they fail to honor returning a non-empty subset of the name list then resolvelib will throw an exception. Otherwise the pitfalls are exactly the same as |
That's partly my fault - I was focusing on the term "unsatisfied" when my confusion is actually more with the idea of "filtering". The rest of my comment was hopefully a little clearer. But having said that, my point here is that while it may be a known term in the resolver codebase (by which I presume you mean resolvelib) it's not well-known in the pip codebase (where, in fact, "satisfied" is typically used in the sense that a requirement is satisfied by an already-installed package). I'm trying to make sure that we don't end up in a situation where understanding the internal workings (and terminology) of resolvelib is a prerequisite for working on pip's resolution code. Maybe the issue is that because you are so familiar with the details of the resolution algorithm, you're not seeing the distinction I'm trying to make?
Let me be explicit then. If the list of unsatisfied names is ["foo", "bar", "baz"], how would pip know whether it's a good idea to remove "baz" from that list? Again, this is a genuine question - I really have no idea how I'd write a useful
Thank you. That information is important and should very definitely be made clear in the resolvelib documentation. It means that But if it's not allowed to return an empty subset, that should be explicitly noted in the documentation, as well. And I hope it will trigger an immediate exception in resolvelib, rather than just doing the wrong thing because the data is bad. Footnotes |
Almost certainly this is contributing the less than ideal communication, but this discussion has been very fruitful I think, it's given me specific updates I can go back and apply to the resolvelib PR so hopefully it will be easier for Pip to maintain interaction with this API in the future.
I think the problem I have answering this question is because the genuine answer is: using a deep understanding of Python packaging, it's ecosystem, the algorithm type resolvelib employs, and SAT-algorithms. The API I propose in the PR is almost identical to So as examples:
Agreed the name isn't intuitively unhelpful, any suggestions would be welcome: |
Thanks. I guess my reservation is on the pip side then, as while I see that this can help a lot with resolution issues, it’s going to be critical that pip’s definition of this method is maintainable (without needing the sort of deep knowledge you possess - the pip code base is already far too inaccessible to new contributors). Let’s wait until there is a pip PR before discussing that in detail1. As to the name, Footnotes
|
The PR exists, #12499, I will update it and the resolvelib PR some time over the next week with the feedback from here. I raise this PR before the resolvelib one is merged, released, and vendored exactly to iron out any issues with Pip (as the main customer of resolvelib) before I push for the resolvelib one to be merged. That PR includes 3 commits: 1) Add new API with no behavior change, 2) Move some existing optimizations over to new API to get speed ups, 3) Add new optimizations (as outlined by #12498) to get massive speed up in complex backtracking cases. |
Ah, OK. I'm afraid I don't really follow the logic in the new provider method in that PR. I'll try to take a proper look another day but my first reactions are:
Beyond that, I'll wait till I have some more time to do a proper review, rather than just a "quickly dash something off while I'm making food" disjointed ramble 🙂 |
Yeah, based on the feedback here I'm going to take quite a bit of time updating the docstrings |
This API is now in resolvelib main, this issue can be closed once resolvelib is released and vendored. Once it is vendored I plan to break out the 3 different optimizations I have created in #12499 into their own PRs that I will post one at a time. In particular because the first one, moving prioritising backtrack causes from |
What's the problem this feature will solve?
Pip's dependency resolution process, particularly in complex backtracking scenarios, currently faces performance issues due to repetitive calculations, leading to O(n^2) complexity. This issue arises when checking if a "name" is part of a cause but also hinders the implementation of more complex checks, as they would slow down simple backtracking scenarios.
Describe the solution you'd like
The solution proposes adding a new abstract method,
filter_unsatisfied_names
, in resolvelib, to be implemented by Pip. This method is designed to work alongside the existingget_preference
method, refining the backtracking process during dependency resolution.filter_unsatisfied_names
pre-filters unsatisfied names before they are processed byget_preference
.get_preference
, which is called once for each name,filter_unsatisfied_names
is called only once per backtracking round.filter_unsatisfied_names
, in contrast toget_preference
, which receives one name at a time.filter_unsatisfied_names
is able to short-circuiting checks.filter_unsatisfied_names
can determine if multiple names are important to backtrack on simultaneously.Alternative Solutions
get_preference
: Improvements to this method could offer some benefits, but it would struggle to enable complex backtracks without imposing performance penalties on simple backtracks.Additional context
I am not tied to the name
filter_unsatisfied_names
or even to this specific approach, as long as it's possible to filter unsatisfied names more than one at a time.A PR is open on the resolvelib side, but it is unlikely to ever be merged without buy-in from Pip maintainers: sarugaku/resolvelib#145
Code of Conduct
The text was updated successfully, but these errors were encountered: