-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding optional CUDA DLLs when installing onnxruntime_gpu #22506
Conversation
There are some test failures, Please fix them. We will remove the "orttraining-linux-gpu-ci-pipeline". The others still need to be taking care of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
setup.py
Outdated
if cuda_version: | ||
f.write(f"cuda_version = '{cuda_version}'\n") | ||
# cudart_versions are integers | ||
cudart_versions = find_cudart_versions(build_env=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find_cudart_versions only works on Linux. I think we can add a check of linux before calling find_cudart_versions to avoid a warning message in Windows.
|
||
|
||
# Load nvidia libraries from site-packages/nvidia if the package is onnxruntime-gpu | ||
if cuda_version is not None and cuda_version != "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my test, cuda_version is still empty string. It is imported from onnxruntime.capi.onnxruntime_validation in line 73. That class only outputs cuda_version for training as below:
onnxruntime/onnxruntime/python/onnxruntime_validation.py
Lines 100 to 102 in 29bccad
cuda_version = "" | |
if has_ortmodule: |
We can remove the line of
if has_ortmodule
there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean the following code usually won't get executed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
@gedoensmax , any comment? |
@jchen351, I tried to run nightly pipelines with your changes but there were some failures. Would you please update your branch with main, so that I can re-run the pipelines again to check if the problem still exists? Before merging this PR, we should generate some test packages and manually test them locally. |
@jchen351 , I tried the new package, but it didn't work.
Could you please verify? |
) | ||
else: | ||
logging.info(f"Unsupported platform: {platform.system()}") | ||
check_and_load_cuda_libs(nvidia_path, cuda_libs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move these code to a function like preload_cuda_libs()
and let user call it explicitly (By default, they are not called).
Example usage:
import onnxruntime
onnxruntime.preload_cuda_libs()
…age (#23659) ### Description Add extra requires for cuda/cudnn DLLs to onnxruntime-gpu python package. During building wheel, make sure to add cuda version parameters to build command line like `--cuda_version 12.8`. Note that we only add extra requires for cuda 12 for now. If a package is built with cuda 11, no extra requires will be added. Examples to install extra DLLs from wheel: ``` pip install onnxruntime_gpu-1.21.0-cp310-cp310-linux_x86_64.whl[cuda,cudnn] ``` If install cudnn DLLs but not cuda DLLs: ``` pip install onnxruntime_gpu-1.21.0-cp310-cp310-linux_x86_64.whl[cudnn] ``` Example section in METADATA file of dist-info: ``` Provides-Extra: cuda Requires-Dist: nvidia-cuda-nvrtc-cu12~=12.0; extra == "cuda" Requires-Dist: nvidia-cuda-runtime-cu12~=12.0; extra == "cuda" Requires-Dist: nvidia-cufft-cu12~=11.0; extra == "cuda" Requires-Dist: nvidia-curand-cu12~=10.0; extra == "cuda" Provides-Extra: cudnn Requires-Dist: nvidia-cudnn-cu12~=9.0; extra == "cudnn" ... ``` ### Motivation and Context Jian had a PR: #22506. This adds only part of the change. Extra change include updating the windows gpu python packaging pipeline to pass cuda version to the build command line. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description Changes: (1) Pass --cuda_version in packaging pipeline to build wheel command line so that cuda_version can be saved. Note that cuda_version is also required for generating extra_require for #23659. (2) Update steup.py and onnxruntime_validation.py to save cuda version to capi/build_and_package_info.py. (3) Add a helper function to preload dependent DLLs (MSVC, CUDA, CUDNN) in `__init__.py`. First we will try to load DLLs from nvidia site packages, then try load remaining DLLs with default path settings. ``` import onnxruntime onnxruntime.preload_dlls() ``` To show loaded DLLs, set `verbose=True`. It is also possible to disable loading some types of DLLs like: ``` onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True) ``` #### PyTorch and onnxruntime in Windows When working with pytorch, onnxruntime will reuse the CUDA and cuDNN DLLs loaded by pytorch as long as CUDA and cuDNN major versions are compatible. Preload DLLs actually might cause issues (see example 2 and 3 below) in Windows. Example 1: onnxruntime and torch can work together easily. ``` >>> import torch >>> import onnxruntime >>> session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"]) >>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True) ----List of loaded DLLs---- D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\curand64_10.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cufft64_11.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_heuristic64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_engines_precompiled64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_ops64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_adv64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cublasLt64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cublas64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\nvrtc64_120_0.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\nvrtc-builtins64_124.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_engines_runtime_compiled64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_cnn64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_graph64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\numpy.libs\msvcp140-d64049c6e3865410a7dda6a7e9f0c575.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudart64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn64_9.dll D:\anaconda3\envs\py310\msvcp140.dll D:\anaconda3\envs\py310\msvcp140_1.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cufftw64_11.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\caffe2_nvrtc.dll D:\anaconda3\envs\py310\vcruntime140_1.dll D:\anaconda3\envs\py310\vcruntime140.dll >>> session.get_providers() ['CUDAExecutionProvider', 'CPUExecutionProvider'] ``` Example 2: Use preload_dlls after `import torch` is not necessary. Unfortunately, it seems that multiple DLLs of same filename are loaded. They can be used in parallel but not ideal since more memory is used. ``` >>> import torch >>> import onnxruntime >>> onnxruntime.preload_dlls(verbose=True) ----List of loaded DLLs---- D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cufft\bin\cufft64_11.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cublas\bin\cublas64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cublas\bin\cublasLt64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\curand64_10.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cufft64_11.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_heuristic64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_engines_precompiled64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_ops64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_adv64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cublasLt64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cublas64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\nvrtc64_120_0.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\nvrtc-builtins64_124.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_engines_runtime_compiled64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_cnn64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_graph64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cudnn\bin\cudnn_graph64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cuda_runtime\bin\cudart64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\numpy.libs\msvcp140-d64049c6e3865410a7dda6a7e9f0c575.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudart64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cudnn\bin\cudnn64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn64_9.dll D:\anaconda3\envs\py310\msvcp140_1.dll D:\anaconda3\envs\py310\msvcp140.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cufftw64_11.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\caffe2_nvrtc.dll D:\anaconda3\envs\py310\vcruntime140_1.dll D:\anaconda3\envs\py310\vcruntime140.dll ``` Example 3: Use preload_dlls before `import torch` might cause torch import error in Windows. Later we may provide an option to load DLLs from torch directory to avoid this issue. ``` >>> import onnxruntime >>> onnxruntime.preload_dlls(verbose=True) ----List of loaded DLLs---- D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cufft\bin\cufft64_11.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cublas\bin\cublas64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cublas\bin\cublasLt64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cudnn\bin\cudnn_graph64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cuda_runtime\bin\cudart64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\numpy.libs\msvcp140-d64049c6e3865410a7dda6a7e9f0c575.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cudnn\bin\cudnn64_9.dll D:\anaconda3\envs\py310\msvcp140.dll D:\anaconda3\envs\py310\vcruntime140_1.dll D:\anaconda3\envs\py310\msvcp140_1.dll D:\anaconda3\envs\py310\vcruntime140.dll >>> import torch Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\anaconda3\envs\py310\lib\site-packages\torch\__init__.py", line 137, in <module> raise err OSError: [WinError 127] The specified procedure could not be found. Error loading "D:\anaconda3\envs\py310\lib\site-packages\torch\lib\cudnn_adv64_9.dll" or one of its dependencies. ``` #### PyTorch and onnxruntime in Linux In Linux, since pytorch uses nvidia site packages for CUDA and cuDNN DLLs. Preload DLLs consistently loads same set of DLLs, and it could help maintaining. ``` >>> import onnxruntime >>> onnxruntime.preload_dlls(verbose=True) ----List of loaded DLLs---- /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn.so.9 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn_graph.so.9 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cufft/lib/libcufft.so.11 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/curand/lib/libcurand.so.10 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cuda_nvrtc/lib/libnvrtc.so.12 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cublas/lib/libcublas.so.12 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cublas/lib/libcublasLt.so.12 >>> import torch >>> torch.rand(3, 3).cuda() tensor([[0.4619, 0.0279, 0.2092], [0.0416, 0.6782, 0.5889], [0.9988, 0.9092, 0.7982]], device='cuda:0') >>> session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"]) >>> session.get_providers() ['CUDAExecutionProvider', 'CPUExecutionProvider'] ``` ``` >>> import torch >>> import onnxruntime >>> session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"]) >>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True) ----List of loaded DLLs---- /cuda12.8/targets/x86_64-linux/lib/libnvrtc.so.12.8.61 /cudnn9.7/lib/libcudnn_graph.so.9.7.0 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cublas/lib/libcublasLt.so.12 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cublas/lib/libcublas.so.12 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/curand/lib/libcurand.so.10 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cufft/lib/libcufft.so.11 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn.so.9 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12 ``` Without preloading DLLs, onnxruntime will load CUDA and cuDNN DLLs based on `LD_LIBRARY_PATH`. Torch will reuse the same DLLs loaded by onnxruntime: ``` >>> import onnxruntime >>> session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"]) >>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True) ----List of loaded DLLs---- /cuda12.8/targets/x86_64-linux/lib/libnvrtc.so.12.8.61 /cuda12.8/targets/x86_64-linux/lib/libcufft.so.11.3.3.41 /cuda12.8/targets/x86_64-linux/lib/libcurand.so.10.3.9.55 /cuda12.8/targets/x86_64-linux/lib/libcublas.so.12.8.3.14 /cuda12.8/targets/x86_64-linux/lib/libcublasLt.so.12.8.3.14 /cudnn9.7/lib/libcudnn_graph.so.9.7.0 /cudnn9.7/lib/libcudnn.so.9.7.0 /cuda12.8/targets/x86_64-linux/lib/libcudart.so.12.8.57 >>> import torch >>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True) ----List of loaded DLLs---- /cuda12.8/targets/x86_64-linux/lib/libnvrtc.so.12.8.61 /cuda12.8/targets/x86_64-linux/lib/libcufft.so.11.3.3.41 /cuda12.8/targets/x86_64-linux/lib/libcurand.so.10.3.9.55 /cuda12.8/targets/x86_64-linux/lib/libcublas.so.12.8.3.14 /cuda12.8/targets/x86_64-linux/lib/libcublasLt.so.12.8.3.14 /cudnn9.7/lib/libcudnn_graph.so.9.7.0 /cudnn9.7/lib/libcudnn.so.9.7.0 /cuda12.8/targets/x86_64-linux/lib/libcudart.so.12.8.57 >>> torch.rand(3, 3).cuda() tensor([[0.2233, 0.9194, 0.8078], [0.0906, 0.2884, 0.3655], [0.6249, 0.2904, 0.4568]], device='cuda:0') >>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True) ----List of loaded DLLs---- /cuda12.8/targets/x86_64-linux/lib/libnvrtc.so.12.8.61 /cuda12.8/targets/x86_64-linux/lib/libcufft.so.11.3.3.41 /cuda12.8/targets/x86_64-linux/lib/libcurand.so.10.3.9.55 /cuda12.8/targets/x86_64-linux/lib/libcublas.so.12.8.3.14 /cuda12.8/targets/x86_64-linux/lib/libcublasLt.so.12.8.3.14 /cudnn9.7/lib/libcudnn_graph.so.9.7.0 /cudnn9.7/lib/libcudnn.so.9.7.0 /cuda12.8/targets/x86_64-linux/lib/libcudart.so.12.8.57 ``` ### Motivation and Context In many reported issues of import onnxruntime failure, the root cause is dependent DLLs missing or not in path. This change will make it easier to resolve those issues. This is based on Jian's PR #22506 with extra change to load msvc dlls. #23659 can be used to install CUDA/cuDNN dlls to site packages. Example command line after next official release 1.21: ``` pip install onnxruntime-gpu[cuda,cudnn] ``` If user installed pytorch in Linux, those DLLs are usually installed together with torch.
|
||
cuda_version_ = tuple(map(int, cuda_version.split("."))) | ||
# Get the site-packages path where nvidia packages are installed | ||
site_packages_path = site.getsitepackages()[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check all sitepackages directories? We might also check whether things like nvidia.cudnn
and nvidia.cudnn.__path__
are importable (import nvidia.cudnn
or importlib)
Description
This code change enable user to install Nvidia CUDA DLLs when installing onnxruntime_gpu. with
pip install onnxruntime_gpu[cuda_dlls]
.It will also enable onnxruntime_gpu to use dynamic libraries under site-packages/nvidia that contain .dll files for Windows and .so files for Linux by temporary updating the environmental variables within an ORT Inferencing session.
Motivation and Context
Request by