Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First stab at reducing startup time #6346

Closed
wants to merge 2 commits into from
Closed

First stab at reducing startup time #6346

wants to merge 2 commits into from

Conversation

boxed
Copy link

@boxed boxed commented Mar 18, 2019

This is maybe more of a discussion/proof of concept for now. But here goes:

pip --version and pip --help takes ~465ms on my work machine. Just moving some imports into functions like this PR does in the files takes that down to ~380ms. If I uncomment the line from pip._internal.vcs import git, mercurial, subversion, bazaar # noqa in pip/_internal/__init__.py I get down to just over 300ms.

Considering how often pip is run in the entire world, I think it is very worth while to optimize.

(This doesn't affect the time it takes to pip install <already installed package> which is what I was really interested in tackling when I started this. But I hope this can be a start!)

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label May 20, 2019
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label May 20, 2019
@boxed boxed mentioned this pull request May 20, 2019
@cjerdonek
Copy link
Member

To go in this direction, I think you'd first need to get agreement on what the approach and "rules" should be around imports -- and also have a sensible way to check that the rules are being followed. Otherwise, any performance improvement could regress in a single commit without anyone knowing it. I'd also be interested in measurements of the hotspots, so efforts could be focused on the most impactful portions first. As it stands now, this PR is too large to do anything with..

@boxed
Copy link
Author

boxed commented May 21, 2019

We also need to actually care. If no one cares and any attempt to improve is shot down its guaranteed to get worse and worse :(

@cjerdonek
Copy link
Member

Certainly people care, so I wouldn't worry about that..

@boxed
Copy link
Author

boxed commented May 21, 2019

This PR is pretty much the minimum needed to move the needle at all.

As for getting some agreement, that's why I submitted the PR. But as you can see its pretty silent in here. To me it's hard to distinguish this from not caring.

@pfmoore
Copy link
Member

pfmoore commented May 21, 2019

I definitely care about performance. But I don't really know what to do about it myself. I'm glad you're interested enough to work on it, and I'd love to help make something like this happen, but it'll need a bit more than simply accepting this PR. Sorry if that's not the simple resolution you were hoping for.

In the first instance you'll need to get the test failures fixed.

Beyond that, I'd like to see some way of ensuring that we don't lose this work in future changes as @cjerdonek mentioned. There's generally a style principle that imports get put at the top of the file, and are sorted (I don't know how much of that principle is enforced by our checkers, but it's certainly something people naturally tend to do). If we can get benefits from violating that principle, I'm fine with that, but we do need to consider how we maintain those benefits in the future - otherwise we get into an endless cycle of "pip is slow, improve pip, pip gets slower again, repeat the process" which feels like a waste of people's work (in particular, yours in this PR).

The code structure after your PR feels very unusual to me (imports scattered through the code feels like a "code smell"), and without knowing why it's necessary to move them, it's hard to judge whether the choice was right. Consider for example that in at least one place, you move an import of os. If I want to use a function from os in a different method, what's my best approach? Add a second import of os? Move the os import back to the top? How would I know? Particularly if I have neither the time or the inclination to do extensive performance tests (which is probably the case for most of the people who submit PRs to us).

Maybe what we need is a section in the manual under "development" explaining how to maintain performance? Maybe we need some sort of performance regression tests? I don't know - that's part of the issue here (as a pip developer, how would I know to flag a PR as needing work to ensure it doesn't harm performance?)

Maybe a more complex, but ultimately more flexible and less maintenance-hostile solution like a lazy import system would be a better approach? I honestly have no idea, and finding out would take some effort (but would be very valuable, and appreciated, work, even if it ultimately determined that lazy imports weren't a win).

pip --version and pip --help takes ~465ms on my work machine. Just moving some imports into functions like this PR does in the files takes that down to ~380ms.

Where did you get those figures from? I'm not disputing them, but benchmarks are notoriously hard to reproduce, and that's a lot of code change for numbers that we can't reliably check (to ensure that we don't lose the gains).

For contrast, on my (Windows) PC, an initial run of py -m pip --version took 8.095 seconds, but immediately re-running it took 2.302 seconds, then 1.637 seconds at which point it had stabilised at around 1.6 sec. After that test, py -m pip --help took 1.6 seconds, suggesting that the bulk of the timing issues were non-Python disk caching issues and similar.

Running path/to/actual/python.exe -m pip --version took around 1.3 seconds, suggesting that the overhead from the launcher is abour 0.3 seconds. And path/to/pip.exe --version took initially 4 seconds, dropping to around 1.8 seconds, suggesting that the wrappers cost us 0.5 seconds and introduces an extra thing that needs to be cached.

On my other PC (newer, Windows 10), py -m pip --version, python -m pip --version and pip.exe --version took about 0.5 sec, 0.44 sec and 0.46 sec respectively. Which are a lot closer to your figures.

All of which doesn't say much, apart from (1) I'm not in a position to check any code changes I make to ensure they don't introduce performance regressions in future, (2) in real world usage, other factors may swamp these improvements for at least some of our users, and (3) benchmarking is hard :-)

@pypa-bot
Copy link

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will eligible for code review and hopefully merging!

@pypa-bot pypa-bot added the needs rebase or merge PR has conflicts with current master label May 22, 2019
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label May 22, 2019
@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label May 26, 2019
@cjerdonek
Copy link
Member

cjerdonek commented Aug 9, 2019

@boxed Are you okay with closing this? I think these gains (and more) were achieved a different way in other PR's by reorganizing / removing module imports that aren't needed: e.g. PR #6545 ("Remove from pip/_internal/__init__.py the vcs module imports"), PR #6694 ("Only import a Command class when needed"), and also the follow-on PR's of #6835 ("Move RequirementCommand to a different module") and #6843 ("Eliminate base_command.py's dependence on PipSession").

@boxed
Copy link
Author

boxed commented Aug 9, 2019

Sure. Great to see these changes landing!

@boxed boxed closed this Aug 9, 2019
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Sep 8, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Sep 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation needs rebase or merge PR has conflicts with current master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants