Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible CUDA versions installed from Pip #19465

Closed
rosario-purple opened this issue Jan 22, 2024 · 3 comments
Closed

Incompatible CUDA versions installed from Pip #19465

rosario-purple opened this issue Jan 22, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@rosario-purple
Copy link

Description

When using the default install command:

pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

it tries to install CUDA 12.3 libraries. However, the most recent stable version of PyTorch (2.1.2) is pinned to 12.1, and that's what I have on my machine. (I'm compiling flash-attn, TransformerEngine and MS-AMP on this machine after installing the main package dependencies, and the compilation appears to be sensitive to CUDA version.)

What jax/jaxlib version are you using?

0.4.23

Which accelerator(s) are you using?

GPU, Nvidia A100

Additional system info?

1.24.4 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] uname_result(system='Linux', node='7e72bd4e-01', release='5.15.0-91-generic', version='#101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023', machine='x86_64')

NVIDIA GPU info

Mon Jan 22 11:32:43 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:11:00.0 Off | 0 |
| N/A 29C P0 62W / 400W | 5MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:12:00.0 Off | 0 |
| N/A 27C P0 60W / 400W | 5MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-SXM4-80GB On | 00000000:13:00.0 Off | 0 |
| N/A 26C P0 61W / 400W | 5MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-SXM4-80GB On | 00000000:14:00.0 Off | 0 |
| N/A 28C P0 62W / 400W | 5MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA A100-SXM4-80GB On | 00000000:21:00.0 Off | 0 |
| N/A 29C P0 61W / 400W | 5MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA A100-SXM4-80GB On | 00000000:22:00.0 Off | 0 |
| N/A 27C P0 61W / 400W | 5MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA A100-SXM4-80GB On | 00000000:23:00.0 Off | 0 |
| N/A 26C P0 62W / 400W | 5MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA A100-SXM4-80GB On | 00000000:24:00.0 Off | 0 |
| N/A 29C P0 62W / 400W | 5MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+

@rosario-purple rosario-purple added the bug Something isn't working label Jan 22, 2024
@jakevdp
Copy link
Collaborator

jakevdp commented Jan 22, 2024

Hi - thanks for the question. You'll find some relevant info here: #18032 (comment)

Our general approach is to provide two JAX CUDA builds: one at the most recent release (currently 12.3) and one older version that seeks to maintain compatibility with pytorch's requirements. Currently that is our cuda11 build.

@jakevdp
Copy link
Collaborator

jakevdp commented Jan 22, 2024

I'm going to close this, since it's essentially a duplicate of #18032. Thanks!

@jakevdp jakevdp closed this as completed Jan 22, 2024
@jakevdp jakevdp self-assigned this Jan 22, 2024
@hawkinsp
Copy link
Collaborator

I'll note that the safest thing to do in general is use two separate venvs. Different frameworks have different version requirements. JAX tracks CUDA versions faster than PyTorch does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants