Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding optional CUDA DLLs when installing onnxruntime_gpu #22506

Closed
wants to merge 50 commits into from

Conversation

jchen351
Copy link
Contributor

Description

This code change enable user to install Nvidia CUDA DLLs when installing onnxruntime_gpu. with pip install onnxruntime_gpu[cuda_dlls].

It will also enable onnxruntime_gpu to use dynamic libraries under site-packages/nvidia that contain .dll files for Windows and .so files for Linux by temporary updating the environmental variables within an ORT Inferencing session.

Motivation and Context

Request by

@snnn snnn requested a review from jywu-msft October 22, 2024 01:16
@snnn
Copy link
Member

snnn commented Oct 23, 2024

There are some test failures, Please fix them. We will remove the "orttraining-linux-gpu-ci-pipeline". The others still need to be taking care of.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@jchen351 jchen351 requested a review from tianleiwu November 7, 2024 22:31
@jchen351 jchen351 closed this Nov 11, 2024
@jchen351 jchen351 requested a review from tianleiwu December 17, 2024 02:05
@jchen351 jchen351 requested a review from tianleiwu December 19, 2024 03:53
setup.py Outdated
if cuda_version:
f.write(f"cuda_version = '{cuda_version}'\n")
# cudart_versions are integers
cudart_versions = find_cudart_versions(build_env=True)
Copy link
Contributor

@tianleiwu tianleiwu Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_cudart_versions only works on Linux. I think we can add a check of linux before calling find_cudart_versions to avoid a warning message in Windows.



# Load nvidia libraries from site-packages/nvidia if the package is onnxruntime-gpu
if cuda_version is not None and cuda_version != "":
Copy link
Contributor

@tianleiwu tianleiwu Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my test, cuda_version is still empty string. It is imported from onnxruntime.capi.onnxruntime_validation in line 73. That class only outputs cuda_version for training as below:

cuda_version = ""
if has_ortmodule:

We can remove the line of if has_ortmodule there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the following code usually won't get executed?

@jchen351 jchen351 requested a review from tianleiwu December 22, 2024 02:33

try: # noqa: SIM105
from .build_and_package_info import cuda_version
except Exception:

Check notice

Code scanning / CodeQL

Empty except Note

'except' clause does nothing but pass and there is no explanatory comment.
tianleiwu
tianleiwu previously approved these changes Dec 22, 2024
@snnn snnn requested a review from Copilot December 30, 2024 23:05

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

@snnn
Copy link
Member

snnn commented Dec 30, 2024

@gedoensmax , any comment?

@snnn
Copy link
Member

snnn commented Dec 31, 2024

@jchen351, I tried to run nightly pipelines with your changes but there were some failures. Would you please update your branch with main, so that I can re-run the pipelines again to check if the problem still exists? Before merging this PR, we should generate some test packages and manually test them locally.

@snnn
Copy link
Member

snnn commented Jan 9, 2025

@jchen351 , I tried the new package, but it didn't work.

# pip install onnxruntime-gpu[cuda_dlls] --pre --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/
Looking in indexes: https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/
Collecting onnxruntime-gpu[cuda_dlls]
  Downloading https://aiinfra.pkgs.visualstudio.com/2692857e-05ef-43b4-ba9c-ccf1c22c437c/_packaging/7982ae20-ed19-4a35-a362-a96ac99897b7/pypi/download/onnxruntime-gpu/1.21.dev20250108002/onnxruntime_gpu-1.21.0.dev20250108002-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (291.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 291.9/291.9 MB 11.5 MB/s eta 0:00:00
Requirement already satisfied: flatbuffers in /usr/local/lib/python3.10/dist-packages (from onnxruntime-gpu[cuda_dlls]) (24.12.23)
Requirement already satisfied: protobuf in /usr/local/lib/python3.10/dist-packages (from onnxruntime-gpu[cuda_dlls]) (5.29.3)
Requirement already satisfied: numpy>=1.21.6 in /usr/local/lib/python3.10/dist-packages (from onnxruntime-gpu[cuda_dlls]) (2.2.1)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from onnxruntime-gpu[cuda_dlls]) (1.13.3)
Requirement already satisfied: coloredlogs in /usr/local/lib/python3.10/dist-packages (from onnxruntime-gpu[cuda_dlls]) (15.0.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from onnxruntime-gpu[cuda_dlls]) (24.2)
Requirement already satisfied: humanfriendly>=9.1 in /usr/local/lib/python3.10/dist-packages (from coloredlogs->onnxruntime-gpu[cuda_dlls]) (10.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->onnxruntime-gpu[cuda_dlls]) (1.3.0)
Installing collected packages: onnxruntime-gpu
Successfully installed onnxruntime-gpu-1.21.0.dev20250108002

Could you please verify?

)
else:
logging.info(f"Unsupported platform: {platform.system()}")
check_and_load_cuda_libs(nvidia_path, cuda_libs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move these code to a function like preload_cuda_libs() and let user call it explicitly (By default, they are not called).

Example usage:

   import onnxruntime
   onnxruntime.preload_cuda_libs()

@tianleiwu tianleiwu dismissed their stale review February 11, 2025 22:57

see new comment

tianleiwu added a commit that referenced this pull request Feb 14, 2025
…age (#23659)

### Description
Add extra requires for cuda/cudnn DLLs to onnxruntime-gpu python
package.

During building wheel, make sure to add cuda version parameters to build
command line like `--cuda_version 12.8`.

Note that we only add extra requires for cuda 12 for now. If a package
is built with cuda 11, no extra requires will be added.

Examples to install extra DLLs from wheel:
```
pip install onnxruntime_gpu-1.21.0-cp310-cp310-linux_x86_64.whl[cuda,cudnn]
```

If install cudnn DLLs but not cuda DLLs:
```
pip install onnxruntime_gpu-1.21.0-cp310-cp310-linux_x86_64.whl[cudnn]
```

Example section in METADATA file of dist-info:
```
Provides-Extra: cuda
Requires-Dist: nvidia-cuda-nvrtc-cu12~=12.0; extra == "cuda"
Requires-Dist: nvidia-cuda-runtime-cu12~=12.0; extra == "cuda"
Requires-Dist: nvidia-cufft-cu12~=11.0; extra == "cuda"
Requires-Dist: nvidia-curand-cu12~=10.0; extra == "cuda"
Provides-Extra: cudnn
Requires-Dist: nvidia-cudnn-cu12~=9.0; extra == "cudnn"
...
```

### Motivation and Context

Jian had a PR: #22506. This
adds only part of the change. Extra change include updating the windows
gpu python packaging pipeline to pass cuda version to the build command
line.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
tianleiwu added a commit that referenced this pull request Feb 15, 2025
### Description

Changes:
(1) Pass --cuda_version in packaging pipeline to build wheel command
line so that cuda_version can be saved. Note that cuda_version is also
required for generating extra_require for
#23659.
(2) Update steup.py and onnxruntime_validation.py to save cuda version
to capi/build_and_package_info.py.
(3) Add a helper function to preload dependent DLLs (MSVC, CUDA, CUDNN)
in `__init__.py`. First we will try to load DLLs from nvidia site
packages, then try load remaining DLLs with default path settings.

```
import onnxruntime
onnxruntime.preload_dlls()
```

To show loaded DLLs, set `verbose=True`. It is also possible to disable
loading some types of DLLs like:
```
onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True)
```

#### PyTorch and onnxruntime in Windows

When working with pytorch, onnxruntime will reuse the CUDA and cuDNN
DLLs loaded by pytorch as long as CUDA and cuDNN major versions are
compatible. Preload DLLs actually might cause issues (see example 2 and
3 below) in Windows.

Example 1: onnxruntime and torch can work together easily. 
```
>>> import torch
>>> import onnxruntime
>>> session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])
>>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True)
----List of loaded DLLs----
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\curand64_10.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cufft64_11.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_heuristic64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_engines_precompiled64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_ops64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_adv64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cublasLt64_12.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cublas64_12.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\nvrtc64_120_0.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\nvrtc-builtins64_124.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_engines_runtime_compiled64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_cnn64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_graph64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\numpy.libs\msvcp140-d64049c6e3865410a7dda6a7e9f0c575.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudart64_12.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn64_9.dll
D:\anaconda3\envs\py310\msvcp140.dll
D:\anaconda3\envs\py310\msvcp140_1.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cufftw64_11.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\caffe2_nvrtc.dll
D:\anaconda3\envs\py310\vcruntime140_1.dll
D:\anaconda3\envs\py310\vcruntime140.dll
>>> session.get_providers()
['CUDAExecutionProvider', 'CPUExecutionProvider']
```

Example 2: Use preload_dlls after `import torch` is not necessary.
Unfortunately, it seems that multiple DLLs of same filename are loaded.
They can be used in parallel but not ideal since more memory is used.
```
>>> import torch
>>> import onnxruntime
>>> onnxruntime.preload_dlls(verbose=True)
----List of loaded DLLs----
D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cufft\bin\cufft64_11.dll
D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cublas\bin\cublas64_12.dll
D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cublas\bin\cublasLt64_12.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\curand64_10.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cufft64_11.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_heuristic64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_engines_precompiled64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_ops64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_adv64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cublasLt64_12.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cublas64_12.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\nvrtc64_120_0.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\nvrtc-builtins64_124.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_engines_runtime_compiled64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_cnn64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_graph64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cudnn\bin\cudnn_graph64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cuda_runtime\bin\cudart64_12.dll
D:\anaconda3\envs\py310\Lib\site-packages\numpy.libs\msvcp140-d64049c6e3865410a7dda6a7e9f0c575.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudart64_12.dll
D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cudnn\bin\cudnn64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn64_9.dll
D:\anaconda3\envs\py310\msvcp140_1.dll
D:\anaconda3\envs\py310\msvcp140.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cufftw64_11.dll
D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\caffe2_nvrtc.dll
D:\anaconda3\envs\py310\vcruntime140_1.dll
D:\anaconda3\envs\py310\vcruntime140.dll
```

Example 3: Use preload_dlls before `import torch` might cause torch
import error in Windows. Later we may provide an option to load DLLs
from torch directory to avoid this issue.
```
>>> import onnxruntime
>>> onnxruntime.preload_dlls(verbose=True)
----List of loaded DLLs----
D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cufft\bin\cufft64_11.dll
D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cublas\bin\cublas64_12.dll
D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cublas\bin\cublasLt64_12.dll
D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cudnn\bin\cudnn_graph64_9.dll
D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cuda_runtime\bin\cudart64_12.dll
D:\anaconda3\envs\py310\Lib\site-packages\numpy.libs\msvcp140-d64049c6e3865410a7dda6a7e9f0c575.dll
D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cudnn\bin\cudnn64_9.dll
D:\anaconda3\envs\py310\msvcp140.dll
D:\anaconda3\envs\py310\vcruntime140_1.dll
D:\anaconda3\envs\py310\msvcp140_1.dll
D:\anaconda3\envs\py310\vcruntime140.dll
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\anaconda3\envs\py310\lib\site-packages\torch\__init__.py", line 137, in <module>
    raise err
OSError: [WinError 127] The specified procedure could not be found. Error loading "D:\anaconda3\envs\py310\lib\site-packages\torch\lib\cudnn_adv64_9.dll" or one of its dependencies.
```

#### PyTorch and onnxruntime in Linux

In Linux, since pytorch uses nvidia site packages for CUDA and cuDNN
DLLs. Preload DLLs consistently loads same set of DLLs, and it could
help maintaining.

```
>>> import onnxruntime
>>> onnxruntime.preload_dlls(verbose=True)
----List of loaded DLLs----
/anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn.so.9
/anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn_graph.so.9
/anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12
/anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cufft/lib/libcufft.so.11
/anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/curand/lib/libcurand.so.10
/anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cuda_nvrtc/lib/libnvrtc.so.12
/anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cublas/lib/libcublas.so.12
/anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cublas/lib/libcublasLt.so.12
>>> import torch
>>> torch.rand(3, 3).cuda()
tensor([[0.4619, 0.0279, 0.2092],
        [0.0416, 0.6782, 0.5889],
        [0.9988, 0.9092, 0.7982]], device='cuda:0')
>>> session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])
>>> session.get_providers()
['CUDAExecutionProvider', 'CPUExecutionProvider']
```

```
>>> import torch
>>> import onnxruntime
>>> session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])
>>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True)
----List of loaded DLLs----
/cuda12.8/targets/x86_64-linux/lib/libnvrtc.so.12.8.61
/cudnn9.7/lib/libcudnn_graph.so.9.7.0
/anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cublas/lib/libcublasLt.so.12
/anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cublas/lib/libcublas.so.12
/anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/curand/lib/libcurand.so.10
/anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cufft/lib/libcufft.so.11
/anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn.so.9
/anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12
```

Without preloading DLLs, onnxruntime will load CUDA and cuDNN DLLs based
on `LD_LIBRARY_PATH`. Torch will reuse the same DLLs loaded by
onnxruntime:
```
>>> import onnxruntime
>>> session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])
>>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True)
----List of loaded DLLs----
/cuda12.8/targets/x86_64-linux/lib/libnvrtc.so.12.8.61
/cuda12.8/targets/x86_64-linux/lib/libcufft.so.11.3.3.41
/cuda12.8/targets/x86_64-linux/lib/libcurand.so.10.3.9.55
/cuda12.8/targets/x86_64-linux/lib/libcublas.so.12.8.3.14
/cuda12.8/targets/x86_64-linux/lib/libcublasLt.so.12.8.3.14
/cudnn9.7/lib/libcudnn_graph.so.9.7.0
/cudnn9.7/lib/libcudnn.so.9.7.0
/cuda12.8/targets/x86_64-linux/lib/libcudart.so.12.8.57
>>> import torch
>>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True)
----List of loaded DLLs----
/cuda12.8/targets/x86_64-linux/lib/libnvrtc.so.12.8.61
/cuda12.8/targets/x86_64-linux/lib/libcufft.so.11.3.3.41
/cuda12.8/targets/x86_64-linux/lib/libcurand.so.10.3.9.55
/cuda12.8/targets/x86_64-linux/lib/libcublas.so.12.8.3.14
/cuda12.8/targets/x86_64-linux/lib/libcublasLt.so.12.8.3.14
/cudnn9.7/lib/libcudnn_graph.so.9.7.0
/cudnn9.7/lib/libcudnn.so.9.7.0
/cuda12.8/targets/x86_64-linux/lib/libcudart.so.12.8.57
>>> torch.rand(3, 3).cuda()
tensor([[0.2233, 0.9194, 0.8078],
        [0.0906, 0.2884, 0.3655],
        [0.6249, 0.2904, 0.4568]], device='cuda:0')
>>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True)
----List of loaded DLLs----
/cuda12.8/targets/x86_64-linux/lib/libnvrtc.so.12.8.61
/cuda12.8/targets/x86_64-linux/lib/libcufft.so.11.3.3.41
/cuda12.8/targets/x86_64-linux/lib/libcurand.so.10.3.9.55
/cuda12.8/targets/x86_64-linux/lib/libcublas.so.12.8.3.14
/cuda12.8/targets/x86_64-linux/lib/libcublasLt.so.12.8.3.14
/cudnn9.7/lib/libcudnn_graph.so.9.7.0
/cudnn9.7/lib/libcudnn.so.9.7.0
/cuda12.8/targets/x86_64-linux/lib/libcudart.so.12.8.57
```

### Motivation and Context
In many reported issues of import onnxruntime failure, the root cause is
dependent DLLs missing or not in path. This change will make it easier
to resolve those issues.

This is based on Jian's PR
#22506 with extra change to
load msvc dlls.

#23659 can be used to
install CUDA/cuDNN dlls to site packages. Example command line after
next official release 1.21:
```
pip install onnxruntime-gpu[cuda,cudnn]
```

If user installed pytorch in Linux, those DLLs are usually installed
together with torch.

cuda_version_ = tuple(map(int, cuda_version.split(".")))
# Get the site-packages path where nvidia packages are installed
site_packages_path = site.getsitepackages()[-1]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check all sitepackages directories? We might also check whether things like nvidia.cudnn and nvidia.cudnn.__path__ are importable (import nvidia.cudnn or importlib)

@tianleiwu tianleiwu closed this Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants