Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup startup time #4768

Open
pradyunsg opened this issue Oct 5, 2017 · 44 comments
Open

Speedup startup time #4768

pradyunsg opened this issue Oct 5, 2017 · 44 comments
Labels
type: enhancement Improvements to functionality type: performance Commands take too long to run

Comments

@pradyunsg
Copy link
Member

There's a lot to gain from speeding up pip's startup time.

For one, pip takes around 600ms to just print the completion text, which is laggy. (as mentioned in #4755). Further, faster startup time might help with the test-suite situation too.

@pradyunsg pradyunsg added the type: maintenance Related to Development and Maintenance Processes label Oct 5, 2017
@floam
Copy link

floam commented Oct 5, 2017

I did notice one can save about 80ms by using the option to disable the version check - it'd probably make sense not to do that at all for completions by default. Still, a lot remains to be improved.

@pradyunsg pradyunsg mentioned this issue Oct 5, 2017
12 tasks
@dstufft
Copy link
Member

dstufft commented Oct 5, 2017

I wonder how much of this time is taken up by importing stuff.

@pradyunsg
Copy link
Member Author

$ time .tox/py36/bin/python -c "from pip._internal import main"

A rudimentary test of running the above 5 times gives me an average of 0.377s.

@benoit-pierre
Copy link
Member

Importing pip._vendor.pkg_resources is what's taking most of the time on my machine. They are a few changes in setuptools 36.4 and above that will help a little: https://github.com/pypa/setuptools/blob/master/CHANGES.rst#v3640

@pradyunsg
Copy link
Member Author

master is vendoring setuptools 36.4.0 currently...

setuptools==36.4.0

@benoit-pierre
Copy link
Member

Yes, and it's faster than pip 9.0, but there are some further changes in setuptools 36.5 that might help: https://github.com/pypa/setuptools/blob/master/CHANGES.rst#v3650.

@benoit-pierre
Copy link
Member

  • pip 9.0: python -c 'import pip._vendor.pkg_resources' 0.35s user 0.03s system 100% cpu 0.385 total
  • master: python -c "from pip._vendor import pkg_resources" 0.21s user 0.02s system 99% cpu 0.233 total
  • master+updated pkg_resources: python -c "from pip._vendor import pkg_resources" 0.19s user 0.00s system 99% cpu 0.192 total

@pradyunsg
Copy link
Member Author

pradyunsg commented Oct 5, 2017

Oh, nice. That means the next round of vendor updates would bring some speedup. :)

I, personally, am waiting on a new distlib release before giving the vendored libraries another round of updates.

@pradyunsg
Copy link
Member Author

pradyunsg commented Oct 5, 2017

I fired up the profiler and ran pip completion --fish. Here's what I got (all percentages in terms of total time):

  • initial import of pip._internal.__init__: 85%
    • pip._internal.cmdoptions.__init__: 79%
      • pip._internal.index: 78% (basically all of the above time is spent in this import)
        • This is where it gets interesting
        • pip._vendor.html5lib: 15.8%
        • pip._vendor.requests: 9.0%
        • pip._vendor.distlib: 5.2%
        • pip._vendor.packaging: 7.1%
        • pip._internal.download: 35.4%
          • pip._internal.utils.logging: 30.4%
            • pip._internal.utils.misc: 29.6%
              • pip._vendor.pkg_resources: 28.8%
  • pip._internal.__init__.main(): 15%
    • parseopts(): 0.19%
    • command.main(): 14.7%

@pradyunsg
Copy link
Member Author

PS: Need better profiling tools.

@dstufft
Copy link
Member

dstufft commented Oct 5, 2017

Lazy importing will probably solve some of those.

@pradyunsg pradyunsg added type: enhancement Improvements to functionality and removed type: maintenance Related to Development and Maintenance Processes labels Oct 5, 2017
@brettcannon
Copy link
Member

There are plans to (hopefully) make lazy importing easy to switch on for CLI apps like pip in Python 3.7. There is also now a -X importtime argument to CPython 3.7 as well as dtrace/systemtap support to help track where import time is going to help profile this sort of thing.

@boxed
Copy link

boxed commented Nov 25, 2017

@benoit-pierre

master+updated pkg_resources: python -c "from pip._vendor import pkg_resources" 0.19s user 0.00s system 99% cpu 0.192 total

What does "updated pkg_resources" mean? On my machine the initial import is still quite slow even though I have setuptools 38.1.0, so your comment seems very interesting to me! :P

@benoit-pierre
Copy link
Member

@boxed: it means with pip's vendored version of setuptools updated.

@boxed
Copy link

boxed commented Nov 26, 2017

Aha. I tried copying over pkg_resources from my main install over the one inside pip/_vendor, but I didn't see any difference in speed :/

@pradyunsg
Copy link
Member Author

Using CPython 3.7.0's -X importtime.

import time: self [us] | cumulative | imported package
[snip]
import time:       654 |     506881 | pip._internal
[snip]

@lorencarvalho
Copy link
Contributor

Just in case y'all are unaware, pkg_resources is tracking the slowness in pypa/setuptools#510, I didn't see it linked in this issue yet.

@CSDUMMI
Copy link

CSDUMMI commented May 19, 2019

Could you not move the imports from the top of the file
to the function, that needs it?

@boxed
Copy link

boxed commented May 19, 2019

@CSDUMMI I have a PR that does this. It helps somewhat.

@CSDUMMI
Copy link

CSDUMMI commented May 20, 2019

Could I have a link?

@boxed
Copy link

boxed commented May 20, 2019

#6346

@cjerdonek
Copy link
Member

Here is a PR that improves the import situation for the vcs imports: #6545 It removes the pip._internal.vcs imports from pip/internal/__init__.py. This will make it easy to remove vcs imports from the common case, if desired, which can be done in a subsequent commit.

@boxed
Copy link

boxed commented Jun 5, 2019

Let's move the discussion to boxed/p#4

@cjerdonek
Copy link
Member

FYI, PR #6694 ("Only import a Command class when needed") was recently merged, which helps with this.

@cjerdonek
Copy link
Member

I posted PR #6835 to help with this.

@cjerdonek
Copy link
Member

I just posted PR #6843 to continue the work in PR #6835. The PR trims unneeded imports by making it so that commands not requiring downloading / PackageFinder will no longer import that machinery.

@asottile
Copy link
Contributor

asottile commented Feb 5, 2022

I noticed a pretty significant slowdown in the latest released version -- I've tracked it down to here -- might be worth bumping pyparsing once that gets resolved: pyparsing/pyparsing#362

@bluetech
Copy link
Contributor

I also looked into pip startup time a bit. First thing I noticed is tenacity's import of asyncio but seems like @ichard26 already took care of it (thanks!).

Another one I noticed is chardet import. requests supports either chardet or charset_normalizer. From a quick experiment I did replacing the chardet vendor import with non-vendored charset_normalizer import, I get ~27ms for chardet vs. ~7ms for charset_normalizer, using -X importtime on my admittedly ~10 years old laptop.

If there is interest I can try to prepare a PR to replace the charset vendor with charset_normalizer vendor.

@pfmoore
Copy link
Member

pfmoore commented Apr 17, 2024

Note that we can only vendor pure Python libraries. charset_normalizer would at the very least be tricky to vendor because we'd need to find a way to make our vendoring tools ignore the platform-specific wheels. Also, are your benchmarks using the pure Python version? If not, they would need to be re-done to be meaningful.

I don't have any strong opinions on whether we should switch, I'm just noting these points as things to consider if we do.

@bluetech
Copy link
Contributor

Hmm the charset_normalizer github repo tagline says "in pure python" but I guess it's not :)
I just tried again with charset_normalizer-3.3.2-py3-none-any.whl and still get ~7ms so the pure python version looks good as well.

@notatallshaw
Copy link
Member

notatallshaw commented Apr 17, 2024

charset_normalizer was added as an optional dependency to requests when the apache-airflow team found the license for chardet wasn't suitable for them.

The developer did a lot to make it more acceptable to the requests maintainers, such as significantly reducing the amount of dependencies. My understanding is charset_normalizer only has binaries based on compiling pure Python code with mypyc, and isn't shipped by default.

If the developer is still as accommodating, I'd imagine pip would benefit in performance and ease of maintenance with charset_normalizer, but the first thing I would do is check with the developer that they are happy being vendored by pip.

@bluetech
Copy link
Contributor

OK, submitted PR #12638.

BTW, another big import-time hit is packaging -> pyparsing package. I see that packaging has already replaced pyparsing with a hand-rolled parser, and there is a PR #12300 to update pip to use it. I checked the import time with #12300 and it's indeed a nice improvement.

With pyparsing, asyncio and chardet gone it will be a decent improvement to startup time. The remaining big ones are rich and requests/urllib3 but I don't think there is much to do about these for pip install.

bluetech added a commit to bluetech/pip that referenced this issue Apr 18, 2024
@ichard26 ichard26 added the type: performance Commands take too long to run label Apr 19, 2024
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jun 26, 2024
24.1 (2024-06-20)
=================

Vendored Libraries
------------------

- Upgrade truststore to 0.9.1.


24.1b2 (2024-06-12)
===================

Features
--------

- Report informative messages about invalid requirements. (`#12713 <https://github.com/pypa/pip/issues/12713>`_)

Bug Fixes
---------

- Eagerly import the self version check logic to avoid crashes while upgrading or downgrading pip at the same time. (`#12675 <https://github.com/pypa/pip/issues/12675>`_)
- Accommodate for mismatches between different sources of truth for extra names, for packages generated by ``setuptools``. (`#12688 <https://github.com/pypa/pip/issues/12688>`_)
- Accommodate for development versions of CPython ending in ``+`` in the version string. (`#12691 <https://github.com/pypa/pip/issues/12691>`_)

Vendored Libraries
------------------

- Upgrade packaging to 24.1
- Upgrade requests to 2.32.0
- Remove vendored colorama
- Remove vendored six
- Remove vendored webencodings
- Remove vendored charset_normalizer

  ``requests`` provides optional character detection support on some APIs when processing ambiguous bytes. This isn't relevant for pip to function and we're able to remove it due to recent upstream changes.

24.1b1 (2024-05-06)
===================

Deprecations and Removals
-------------------------

- Drop support for EOL Python 3.7. (`#11934 <https://github.com/pypa/pip/issues/11934>`_)
- Remove support for legacy versions and dependency specifiers.

  Packages with non standard-compliant versions or dependency specifiers are now ignored by the resolver.
  Already installed packages with non standard-compliant versions or dependency specifiers
  must be uninstalled before upgrading them. (`#12063 <https://github.com/pypa/pip/issues/12063>`_)

Features
--------

- Improve performance of resolution of large dependency trees, with more caching. (`#12453 <https://github.com/pypa/pip/issues/12453>`_)
- Further improve resolution performance of large dependency trees, by caching hash calculations. (`#12657 <https://github.com/pypa/pip/issues/12657>`_)
- Reduce startup time of commands (e.g. show, freeze) that do not access the network by 15-30%. (`#4768 <https://github.com/pypa/pip/issues/4768>`_)
- Reword and improve presentation of uninstallation errors. (`#10421 <https://github.com/pypa/pip/issues/10421>`_)
- Add a 'raw' progress_bar type for simple and parsable download progress reports (`#11508 <https://github.com/pypa/pip/issues/11508>`_)
- ``pip list`` no longer performs the pip version check unless ``--outdated`` or ``--uptodate`` is given. (`#11677 <https://github.com/pypa/pip/issues/11677>`_)
- Use the ``data_filter`` when extracting tarballs, if it's available. (`#12111 <https://github.com/pypa/pip/issues/12111>`_)
- Display the Project-URL value under key "Home-page" in ``pip show`` when the Home-Page metadata field is not set.

  The Project-URL key detection is case-insensitive, and ignores any dashes and underscores. (`#11221 <https://github.com/pypa/pip/issues/11221>`_)

Bug Fixes
---------

- Ensure ``-vv`` gets passed to any ``pip install`` build environment subprocesses. (`#12577 <https://github.com/pypa/pip/issues/12577>`_)
- Deduplicate entries in the ``Requires`` field of ``pip show``. (`#12165 <https://github.com/pypa/pip/issues/12165>`_)
- Fix error on checkout for subversion and bazaar with verbose mode on. (`#11050 <https://github.com/pypa/pip/issues/11050>`_)
- Fix exception with completions when COMP_CWORD is not set (`#12401 <https://github.com/pypa/pip/issues/12401>`_)
- Fix intermittent "cannot locate t64.exe" errors when upgrading pip. (`#12666 <https://github.com/pypa/pip/issues/12666>`_)
- Remove duplication in invalid wheel error message (`#12579 <https://github.com/pypa/pip/issues/12579>`_)
- Remove the incorrect pip3.x console entrypoint from the pip wheel. This console
  script continues to be generated by pip when it installs itself. (`#12536 <https://github.com/pypa/pip/issues/12536>`_)
- Gracefully skip VCS detection in pip freeze when PATH points to a non-directory path. (`#12567 <https://github.com/pypa/pip/issues/12567>`_)
- Make the ``--proxy`` parameter take precedence over environment variables. (`#10685 <https://github.com/pypa/pip/issues/10685>`_)

Vendored Libraries
------------------

- Add charset-normalizer 3.3.2
- Remove chardet
- Remove pyparsing
- Upgrade CacheControl to 0.14.0
- Upgrade certifi to 2024.2.2
- Upgrade distro to 1.9.0
- Upgrade idna to 3.7
- Upgrade msgpack to 1.0.8
- Upgrade packaging to 24.0
- Upgrade platformdirs to 4.2.1
- Upgrade pygments to 2.17.2
- Upgrade rich to 13.7.1
- Upgrade setuptools to 69.5.1
- Upgrade tenacity to 8.2.3
- Upgrade typing_extensions to 4.11.0
- Upgrade urllib3 to 1.26.18

Improved Documentation
----------------------

- Document UX research done on pip. (`#10745 <https://github.com/pypa/pip/issues/10745>`_)
- Fix the direct usage of zipapp showing up as ``python -m pip.pyz`` rather than ``./pip.pyz`` / ``.\pip.pyz`` (`#12043 <https://github.com/pypa/pip/issues/12043>`_)
- Add a warning explaining that the snippet in "Fallback behavior" is not a valid
  ``pyproject.toml`` snippet for projects, and link to setuptools documentation
  instead. (`#12122 <https://github.com/pypa/pip/issues/12122>`_)
- The Python Support Policy has been updated. (`#12529 <https://github.com/pypa/pip/issues/12529>`_)
- Document the environment variables that correspond with CLI options. (`#12576 <https://github.com/pypa/pip/issues/12576>`_)
- Update architecture documentation for command line interface. (`#6831 <https://github.com/pypa/pip/issues/6831>`_)

Process
-------

- Remove ``setup.py`` since all the pip project metadata is now declared in
  ``pyproject.toml``.
- Move remaining pip development tools configurations to ``pyproject.toml``.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Improvements to functionality type: performance Commands take too long to run
Projects
None yet
Development

No branches or pull requests