Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perf] differences in pip vs uv when installing package from src #3802

Open
ryxli opened this issue May 23, 2024 · 11 comments
Open

[perf] differences in pip vs uv when installing package from src #3802

ryxli opened this issue May 23, 2024 · 11 comments
Labels
performance Potential performance improvement

Comments

@ryxli
Copy link

ryxli commented May 23, 2024

Would like some help or pointers with identifying the reason for the performance different between pip and uv pip install for a package (https://github.com/NVIDIA/TransformerEngine)

Reproduce steps:

>uv --version
uv 0.1.44

>git clone https://github.com/NVIDIA/TransformerEngine.git
>cd TransformerEngine

With regular pip:

> time NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi   pip install --no-build-isolation --no-deps -e .
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///workspace/TransformerEngine
  Preparing metadata (setup.py) ... done
Installing collected packages: transformer-engine
  Attempting uninstall: transformer-engine
    Found existing installation: transformer-engine 1.8.0.dev0+d705f7f
    Uninstalling transformer-engine-1.8.0.dev0+d705f7f:
      Successfully uninstalled transformer-engine-1.8.0.dev0+d705f7f
  Running setup.py develop for transformer-engine
Successfully installed transformer-engine-1.8.0.dev0+d705f7f
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

real    0m12.524s
user    0m10.516s
sys     0m12.773s

With uv pip:

> time NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi uv p
ip install --system --no-build-isolation --no-deps -e .
   Built file:///workspace/TransformerEngine                                                                                                                         Built 1 editable in 1m 09s
Resolved 1 package in 10ms
Installed 1 package in 1ms
 - transformer-engine==1.8.0.dev0+d705f7f
 + transformer-engine==1.8.0.dev0+d705f7f (from file:///workspace/TransformerEngine)

real    1m9.980s
user    16m6.687s
sys     2m12.868s

With --verbose enabled, the command seems to get stuck at Calling setuptools.build_meta:__legacy__.build_editable for a long while:

DEBUG Starting interpreter discovery for default Python
DEBUG Cached interpreter info for Python 3.10.12, skipping probing: /usr/bin/python3
DEBUG Using Python 3.10.12 environment at /usr/bin/python3
DEBUG Trying to lock if free: /tmp/uv-08d95a7330542a29.lock
DEBUG At least one requirement is not satisfied: file:///workspace/TransformerEngine
DEBUG Using registry request timeout of 30s
DEBUG Building (editable) file:///workspace/TransformerEngine
DEBUG Calling `setuptools.build_meta:__legacy__.build_editable("/root/.cache/uv/.tmpKghDdN/.tmphfHmSb", {}, None)`
.....
@ibraheemdev ibraheemdev added the performance Potential performance improvement label May 23, 2024
@charliermarsh
Copy link
Member

Hard for me to test this because it requires CUDA it seems?

@charliermarsh
Copy link
Member

But setuptools.build_meta:__legacy__.build_editable is just the build hook to build the editable -- it's not uv code, but Python code following the standards.

@zanieb
Copy link
Member

zanieb commented May 24, 2024

It seems like pip isn't performing a build? Do their verbose logs have more information?

@charliermarsh
Copy link
Member

I think pip actually doesn't use PEP 517 when doing editables, or something like that.

@ryxli
Copy link
Author

ryxli commented May 24, 2024

Hard for me to test this because it requires CUDA it seems?

Unfortunately this package requires cuda, although the issue related to the install time does not seem related from what I can tell

It seems like pip isn't performing a build? Do their verbose logs have more information?

I did not include the whole logs as on first install most of the time is spent building cpp extensions via cmake. The cmake build is done incrementally, so it doesn't affect the tests above which is when I noticed this significant time difference just comparing uv and pip install.

The uv pip install just hangs on this line for a while, after which the install seems just as fast as regular pip.

DEBUG Calling `setuptools.build_meta:__legacy__.build_editable("/root/.cache/uv/.tmpKghDdN/.tmphfHmSb", {}, None)`

It's possible there may be something else going on under the hood, as cmake logs are not exposed via uv pip install (#1567). But for regular pip, I can get the logs and it seems relatively fast, so unsure what the issue is (10-15 seconds vs 1-2 minutes)

@samypr100
Copy link
Collaborator

Is there a particular setup you have? Can you try on a fully clean environment, e.g. docker image?

@ryxli
Copy link
Author

ryxli commented May 26, 2024

This is while trying to build a docker image

@samypr100
Copy link
Collaborator

Which docker image base were you using?

@ryxli
Copy link
Author

ryxli commented May 26, 2024

Nvidia pytorch image

https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch

Using 24.04-py3

@samypr100
Copy link
Collaborator

Thanks, I was attempting it earlier in a nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 image and was getting relatively same build times across both. I will try using nvcr.io/nvidia/pytorch:24.04-py3, but from a quick glance it seems transformer-engine is already pre-built in it which could explain some of the speed differences.

@ryxli
Copy link
Author

ryxli commented May 27, 2024

This is after uninstalling transformer_engine n in the base image, and then installing it from src

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
Development

No branches or pull requests

5 participants