-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First stab at reducing startup time #6346
Conversation
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
To go in this direction, I think you'd first need to get agreement on what the approach and "rules" should be around imports -- and also have a sensible way to check that the rules are being followed. Otherwise, any performance improvement could regress in a single commit without anyone knowing it. I'd also be interested in measurements of the hotspots, so efforts could be focused on the most impactful portions first. As it stands now, this PR is too large to do anything with.. |
We also need to actually care. If no one cares and any attempt to improve is shot down its guaranteed to get worse and worse :( |
Certainly people care, so I wouldn't worry about that.. |
This PR is pretty much the minimum needed to move the needle at all. As for getting some agreement, that's why I submitted the PR. But as you can see its pretty silent in here. To me it's hard to distinguish this from not caring. |
I definitely care about performance. But I don't really know what to do about it myself. I'm glad you're interested enough to work on it, and I'd love to help make something like this happen, but it'll need a bit more than simply accepting this PR. Sorry if that's not the simple resolution you were hoping for. In the first instance you'll need to get the test failures fixed. Beyond that, I'd like to see some way of ensuring that we don't lose this work in future changes as @cjerdonek mentioned. There's generally a style principle that imports get put at the top of the file, and are sorted (I don't know how much of that principle is enforced by our checkers, but it's certainly something people naturally tend to do). If we can get benefits from violating that principle, I'm fine with that, but we do need to consider how we maintain those benefits in the future - otherwise we get into an endless cycle of "pip is slow, improve pip, pip gets slower again, repeat the process" which feels like a waste of people's work (in particular, yours in this PR). The code structure after your PR feels very unusual to me (imports scattered through the code feels like a "code smell"), and without knowing why it's necessary to move them, it's hard to judge whether the choice was right. Consider for example that in at least one place, you move an import of Maybe what we need is a section in the manual under "development" explaining how to maintain performance? Maybe we need some sort of performance regression tests? I don't know - that's part of the issue here (as a pip developer, how would I know to flag a PR as needing work to ensure it doesn't harm performance?) Maybe a more complex, but ultimately more flexible and less maintenance-hostile solution like a lazy import system would be a better approach? I honestly have no idea, and finding out would take some effort (but would be very valuable, and appreciated, work, even if it ultimately determined that lazy imports weren't a win).
Where did you get those figures from? I'm not disputing them, but benchmarks are notoriously hard to reproduce, and that's a lot of code change for numbers that we can't reliably check (to ensure that we don't lose the gains). For contrast, on my (Windows) PC, an initial run of Running On my other PC (newer, Windows 10), All of which doesn't say much, apart from (1) I'm not in a position to check any code changes I make to ensure they don't introduce performance regressions in future, (2) in real world usage, other factors may swamp these improvements for at least some of our users, and (3) benchmarking is hard :-) |
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
@boxed Are you okay with closing this? I think these gains (and more) were achieved a different way in other PR's by reorganizing / removing module imports that aren't needed: e.g. PR #6545 ("Remove from |
Sure. Great to see these changes landing! |
This is maybe more of a discussion/proof of concept for now. But here goes:
pip --version
andpip --help
takes ~465ms on my work machine. Just moving some imports into functions like this PR does in the files takes that down to ~380ms. If I uncomment the linefrom pip._internal.vcs import git, mercurial, subversion, bazaar # noqa
inpip/_internal/__init__.py
I get down to just over 300ms.Considering how often pip is run in the entire world, I think it is very worth while to optimize.
(This doesn't affect the time it takes to
pip install <already installed package>
which is what I was really interested in tackling when I started this. But I hope this can be a start!)