Add extra requires for cuda/cudnn DLLs to onnxruntime-gpu python package #23659

tianleiwu · 2025-02-12T01:59:45Z

Description

Add extra requires for cuda/cudnn DLLs to onnxruntime-gpu python package.

During building wheel, make sure to add cuda version parameters to build command line like --cuda_version 12.8.

Note that we only add extra requires for cuda 12 for now. If a package is built with cuda 11, no extra requires will be added.

Examples to install extra DLLs from wheel:

pip install onnxruntime_gpu-1.21.0-cp310-cp310-linux_x86_64.whl[cuda,cudnn]

If install cudnn DLLs but not cuda DLLs:

pip install onnxruntime_gpu-1.21.0-cp310-cp310-linux_x86_64.whl[cudnn]

Example section in METADATA file of dist-info:

Provides-Extra: cuda
Requires-Dist: nvidia-cuda-nvrtc-cu12~=12.0; extra == "cuda"
Requires-Dist: nvidia-cuda-runtime-cu12~=12.0; extra == "cuda"
Requires-Dist: nvidia-cufft-cu12~=11.0; extra == "cuda"
Requires-Dist: nvidia-curand-cu12~=10.0; extra == "cuda"
Provides-Extra: cudnn
Requires-Dist: nvidia-cudnn-cu12~=9.0; extra == "cudnn"
...

Motivation and Context

Jian had a PR: #22506. This adds only part of the change. Extra change include updating the windows gpu python packaging pipeline to pass cuda version to the build command line.

tianleiwu · 2025-02-12T18:21:58Z

/azp run ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline

tianleiwu · 2025-02-12T18:22:00Z

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline,CoreML CI Pipeline,Linux DNNL CI Pipeline,Linux MIGraphX CI Pipeline,Linux ROCm CI Pipeline

azure-pipelines · 2025-02-12T18:22:27Z

Azure Pipelines successfully started running 6 pipeline(s).

azure-pipelines · 2025-02-12T18:22:32Z

Azure Pipelines successfully started running 8 pipeline(s).

tools/ci_build/github/azure-pipelines/stages/py-gpu-packaging-stage.yml

github-actions

You can commit the suggested changes from lintrunner.

setup.py

snnn · 2025-02-14T04:17:52Z

In

onnxruntime/tools/ci_build/build.py

Line 2401 in c420052

args = [sys.executable, os.path.join(source_dir, "setup.py"), "bdist_wheel"]

, before calling setup.py, you may add a piece of code to check if is_windows() and --cuda_home is set, if yes, then read a json file from the location to extract cuda version. The file's name is "version.json"

github-actions

You can commit the suggested changes from lintrunner.

tools/ci_build/build.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

### Description Changes: (1) Pass --cuda_version in packaging pipeline to build wheel command line so that cuda_version can be saved. Note that cuda_version is also required for generating extra_require for #23659. (2) Update steup.py and onnxruntime_validation.py to save cuda version to capi/build_and_package_info.py. (3) Add a helper function to preload dependent DLLs (MSVC, CUDA, CUDNN) in `__init__.py`. First we will try to load DLLs from nvidia site packages, then try load remaining DLLs with default path settings. ``` import onnxruntime onnxruntime.preload_dlls() ``` To show loaded DLLs, set `verbose=True`. It is also possible to disable loading some types of DLLs like: ``` onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True) ``` #### PyTorch and onnxruntime in Windows When working with pytorch, onnxruntime will reuse the CUDA and cuDNN DLLs loaded by pytorch as long as CUDA and cuDNN major versions are compatible. Preload DLLs actually might cause issues (see example 2 and 3 below) in Windows. Example 1: onnxruntime and torch can work together easily. ``` >>> import torch >>> import onnxruntime >>> session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"]) >>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True) ----List of loaded DLLs---- D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\curand64_10.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cufft64_11.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_heuristic64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_engines_precompiled64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_ops64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_adv64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cublasLt64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cublas64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\nvrtc64_120_0.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\nvrtc-builtins64_124.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_engines_runtime_compiled64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_cnn64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_graph64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\numpy.libs\msvcp140-d64049c6e3865410a7dda6a7e9f0c575.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudart64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn64_9.dll D:\anaconda3\envs\py310\msvcp140.dll D:\anaconda3\envs\py310\msvcp140_1.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cufftw64_11.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\caffe2_nvrtc.dll D:\anaconda3\envs\py310\vcruntime140_1.dll D:\anaconda3\envs\py310\vcruntime140.dll >>> session.get_providers() ['CUDAExecutionProvider', 'CPUExecutionProvider'] ``` Example 2: Use preload_dlls after `import torch` is not necessary. Unfortunately, it seems that multiple DLLs of same filename are loaded. They can be used in parallel but not ideal since more memory is used. ``` >>> import torch >>> import onnxruntime >>> onnxruntime.preload_dlls(verbose=True) ----List of loaded DLLs---- D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cufft\bin\cufft64_11.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cublas\bin\cublas64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cublas\bin\cublasLt64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\curand64_10.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cufft64_11.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_heuristic64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_engines_precompiled64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_ops64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_adv64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cublasLt64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cublas64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\nvrtc64_120_0.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\nvrtc-builtins64_124.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_engines_runtime_compiled64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_cnn64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn_graph64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cudnn\bin\cudnn_graph64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cuda_runtime\bin\cudart64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\numpy.libs\msvcp140-d64049c6e3865410a7dda6a7e9f0c575.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudart64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cudnn\bin\cudnn64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cudnn64_9.dll D:\anaconda3\envs\py310\msvcp140_1.dll D:\anaconda3\envs\py310\msvcp140.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\cufftw64_11.dll D:\anaconda3\envs\py310\Lib\site-packages\torch\lib\caffe2_nvrtc.dll D:\anaconda3\envs\py310\vcruntime140_1.dll D:\anaconda3\envs\py310\vcruntime140.dll ``` Example 3: Use preload_dlls before `import torch` might cause torch import error in Windows. Later we may provide an option to load DLLs from torch directory to avoid this issue. ``` >>> import onnxruntime >>> onnxruntime.preload_dlls(verbose=True) ----List of loaded DLLs---- D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cufft\bin\cufft64_11.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cublas\bin\cublas64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cublas\bin\cublasLt64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cudnn\bin\cudnn_graph64_9.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cuda_runtime\bin\cudart64_12.dll D:\anaconda3\envs\py310\Lib\site-packages\numpy.libs\msvcp140-d64049c6e3865410a7dda6a7e9f0c575.dll D:\anaconda3\envs\py310\Lib\site-packages\nvidia\cudnn\bin\cudnn64_9.dll D:\anaconda3\envs\py310\msvcp140.dll D:\anaconda3\envs\py310\vcruntime140_1.dll D:\anaconda3\envs\py310\msvcp140_1.dll D:\anaconda3\envs\py310\vcruntime140.dll >>> import torch Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\anaconda3\envs\py310\lib\site-packages\torch\__init__.py", line 137, in <module> raise err OSError: [WinError 127] The specified procedure could not be found. Error loading "D:\anaconda3\envs\py310\lib\site-packages\torch\lib\cudnn_adv64_9.dll" or one of its dependencies. ``` #### PyTorch and onnxruntime in Linux In Linux, since pytorch uses nvidia site packages for CUDA and cuDNN DLLs. Preload DLLs consistently loads same set of DLLs, and it could help maintaining. ``` >>> import onnxruntime >>> onnxruntime.preload_dlls(verbose=True) ----List of loaded DLLs---- /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn.so.9 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn_graph.so.9 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cufft/lib/libcufft.so.11 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/curand/lib/libcurand.so.10 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cuda_nvrtc/lib/libnvrtc.so.12 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cublas/lib/libcublas.so.12 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cublas/lib/libcublasLt.so.12 >>> import torch >>> torch.rand(3, 3).cuda() tensor([[0.4619, 0.0279, 0.2092], [0.0416, 0.6782, 0.5889], [0.9988, 0.9092, 0.7982]], device='cuda:0') >>> session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"]) >>> session.get_providers() ['CUDAExecutionProvider', 'CPUExecutionProvider'] ``` ``` >>> import torch >>> import onnxruntime >>> session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"]) >>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True) ----List of loaded DLLs---- /cuda12.8/targets/x86_64-linux/lib/libnvrtc.so.12.8.61 /cudnn9.7/lib/libcudnn_graph.so.9.7.0 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cublas/lib/libcublasLt.so.12 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cublas/lib/libcublas.so.12 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/curand/lib/libcurand.so.10 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cufft/lib/libcufft.so.11 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn.so.9 /anaconda3/envs/py310/lib/python3.10/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12 ``` Without preloading DLLs, onnxruntime will load CUDA and cuDNN DLLs based on `LD_LIBRARY_PATH`. Torch will reuse the same DLLs loaded by onnxruntime: ``` >>> import onnxruntime >>> session = onnxruntime.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"]) >>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True) ----List of loaded DLLs---- /cuda12.8/targets/x86_64-linux/lib/libnvrtc.so.12.8.61 /cuda12.8/targets/x86_64-linux/lib/libcufft.so.11.3.3.41 /cuda12.8/targets/x86_64-linux/lib/libcurand.so.10.3.9.55 /cuda12.8/targets/x86_64-linux/lib/libcublas.so.12.8.3.14 /cuda12.8/targets/x86_64-linux/lib/libcublasLt.so.12.8.3.14 /cudnn9.7/lib/libcudnn_graph.so.9.7.0 /cudnn9.7/lib/libcudnn.so.9.7.0 /cuda12.8/targets/x86_64-linux/lib/libcudart.so.12.8.57 >>> import torch >>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True) ----List of loaded DLLs---- /cuda12.8/targets/x86_64-linux/lib/libnvrtc.so.12.8.61 /cuda12.8/targets/x86_64-linux/lib/libcufft.so.11.3.3.41 /cuda12.8/targets/x86_64-linux/lib/libcurand.so.10.3.9.55 /cuda12.8/targets/x86_64-linux/lib/libcublas.so.12.8.3.14 /cuda12.8/targets/x86_64-linux/lib/libcublasLt.so.12.8.3.14 /cudnn9.7/lib/libcudnn_graph.so.9.7.0 /cudnn9.7/lib/libcudnn.so.9.7.0 /cuda12.8/targets/x86_64-linux/lib/libcudart.so.12.8.57 >>> torch.rand(3, 3).cuda() tensor([[0.2233, 0.9194, 0.8078], [0.0906, 0.2884, 0.3655], [0.6249, 0.2904, 0.4568]], device='cuda:0') >>> onnxruntime.preload_dlls(cuda=False, cudnn=False, msvc=False, verbose=True) ----List of loaded DLLs---- /cuda12.8/targets/x86_64-linux/lib/libnvrtc.so.12.8.61 /cuda12.8/targets/x86_64-linux/lib/libcufft.so.11.3.3.41 /cuda12.8/targets/x86_64-linux/lib/libcurand.so.10.3.9.55 /cuda12.8/targets/x86_64-linux/lib/libcublas.so.12.8.3.14 /cuda12.8/targets/x86_64-linux/lib/libcublasLt.so.12.8.3.14 /cudnn9.7/lib/libcudnn_graph.so.9.7.0 /cudnn9.7/lib/libcudnn.so.9.7.0 /cuda12.8/targets/x86_64-linux/lib/libcudart.so.12.8.57 ``` ### Motivation and Context In many reported issues of import onnxruntime failure, the root cause is dependent DLLs missing or not in path. This change will make it easier to resolve those issues. This is based on Jian's PR #22506 with extra change to load msvc dlls. #23659 can be used to install CUDA/cuDNN dlls to site packages. Example command line after next official release 1.21: ``` pip install onnxruntime-gpu[cuda,cudnn] ``` If user installed pytorch in Linux, those DLLs are usually installed together with torch.

add cuda/cudnn dlls to extra requires

35ecdcc

tianleiwu marked this pull request as draft February 12, 2025 01:59

tianleiwu added 2 commits February 11, 2025 22:17

pass cuda version to windows gpu wheel

baa0cf7

add all dlls

5eb4d57

tianleiwu marked this pull request as ready for review February 12, 2025 06:58

tianleiwu requested review from snnn and jchen351 February 12, 2025 07:00

Merge branch 'main' into tlwu/extra_require

c5584e0

tianleiwu mentioned this pull request Feb 13, 2025

[CUDA] Preload dependent DLLs #23674

Merged

tianleiwu added 2 commits February 13, 2025 15:10

update packaging pipeline

d75b767

Merge branch 'main' into tlwu/extra_require

64fcad5

snnn reviewed Feb 14, 2025

View reviewed changes

tools/ci_build/github/azure-pipelines/stages/py-gpu-packaging-stage.yml Outdated Show resolved Hide resolved

github-actions bot reviewed Feb 14, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

split cuda and cudnn

110bc02

tianleiwu force-pushed the tlwu/extra_require branch from cb968a8 to 110bc02 Compare February 14, 2025 04:49

parse cuda version from version.json in cuda_home

2fc2905

github-actions bot reviewed Feb 14, 2025

View reviewed changes

tools/ci_build/build.py Outdated Show resolved Hide resolved

tools/ci_build/build.py Outdated Show resolved Hide resolved

tools/ci_build/build.py Show resolved Hide resolved

Apply suggestions from code review

12bd402

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

snnn approved these changes Feb 14, 2025

View reviewed changes

tianleiwu merged commit 444606d into main Feb 14, 2025
110 of 112 checks passed

tianleiwu deleted the tlwu/extra_require branch February 14, 2025 17:35

tianleiwu mentioned this pull request Feb 14, 2025

[Doc] Update CUDA and cuDNN installation and preload #23708

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add extra requires for cuda/cudnn DLLs to onnxruntime-gpu python package #23659

Add extra requires for cuda/cudnn DLLs to onnxruntime-gpu python package #23659

tianleiwu commented Feb 12, 2025 •

edited

Loading

tianleiwu commented Feb 12, 2025

tianleiwu commented Feb 12, 2025

azure-pipelines bot commented Feb 12, 2025

azure-pipelines bot commented Feb 12, 2025

github-actions bot left a comment

snnn commented Feb 14, 2025 •

edited

Loading

github-actions bot left a comment

Add extra requires for cuda/cudnn DLLs to onnxruntime-gpu python package #23659

Add extra requires for cuda/cudnn DLLs to onnxruntime-gpu python package #23659

Conversation

tianleiwu commented Feb 12, 2025 • edited Loading

Description

Motivation and Context

tianleiwu commented Feb 12, 2025

tianleiwu commented Feb 12, 2025

azure-pipelines bot commented Feb 12, 2025

azure-pipelines bot commented Feb 12, 2025

github-actions bot left a comment

Choose a reason for hiding this comment

snnn commented Feb 14, 2025 • edited Loading

github-actions bot left a comment

Choose a reason for hiding this comment

tianleiwu commented Feb 12, 2025 •

edited

Loading

snnn commented Feb 14, 2025 •

edited

Loading