Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cudnn from v8 to v9 across CUDA versions and x86/arm #1847

Merged
merged 9 commits into from
Jun 4, 2024

Conversation

nWEIdia
Copy link
Collaborator

@nWEIdia nWEIdia commented May 30, 2024

Re-land #1822

Supporting pytorch/pytorch#123475

Reference PR: #1271

cc @eqy @tinglvv @ptrblck @atalman @malfet

@tinglvv
Copy link
Collaborator

tinglvv commented May 31, 2024

Thanks for preparing this! Let me test locally if the upgrade would break anything for ARM.

@tinglvv
Copy link
Collaborator

tinglvv commented May 31, 2024

Suggested one change.
Built wheel with cudnnv9 and running into this error when running the tests:

Unable to load any of {libcudnn_engines_precompiled.so.9.1.0, libcudnn_engines_precompiled.so.9.1, libcudnn_engines_precompiled.so.9, libcudnn_engines_precompiled.so}
 File "/test-arm/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDNN_BACKEND_TENSOR_DESCRIPTOR cudnnFinalize failed cudnn_status: CUDNN_STATUS_NOT_INITIALIZED`

@nWEIdia We will need to resolve this before merging the change.

@nWEIdia
Copy link
Collaborator Author

nWEIdia commented Jun 1, 2024

Suggested one change. Built wheel with cudnnv9 and running into this error when running the tests:

Unable to load any of {libcudnn_engines_precompiled.so.9.1.0, libcudnn_engines_precompiled.so.9.1, libcudnn_engines_precompiled.so.9, libcudnn_engines_precompiled.so}
 File "/test-arm/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDNN_BACKEND_TENSOR_DESCRIPTOR cudnnFinalize failed cudnn_status: CUDNN_STATUS_NOT_INITIALIZED`

@nWEIdia We will need to resolve this before merging the change.

Thanks @tinglvv ! Could you please briefly describe the reproducer steps? I do know currently it has not been successful in building v9 based wheel, did you manually build a v9 based arm cuda wheel?

@@ -103,7 +103,7 @@ def update_wheel(wheel_path) -> None:
os.system(f"unzip {wheel_path} -d {folder}/tmp")
libs_to_copy = [
"/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.12",
"/usr/local/cuda/lib64/libcudnn.so.8",
"/usr/local/cuda/lib64/libcudnn.so.9",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, do we plan to carry both the legacy API and new API here so that we can migrate pytorch over to the new graph API? this seems like the size impact would roughly double which as things currently stand ... its already quite a lot in terms of overall final artifact size

ref https://docs.nvidia.com/deeplearning/cudnn/latest/api/overview.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point and I think this question applies to x86 as well.
This PR was created not considering the size impact and ship both legacy and new API .so files, for both x86 and arm.

cc @ptrblck @eqy @malfet @atalman @tinglvv for additional inputs

@tinglvv
Copy link
Collaborator

tinglvv commented Jun 3, 2024

Suggested one change. Built wheel with cudnnv9 and running into this error when running the tests:

Unable to load any of {libcudnn_engines_precompiled.so.9.1.0, libcudnn_engines_precompiled.so.9.1, libcudnn_engines_precompiled.so.9, libcudnn_engines_precompiled.so}
 File "/test-arm/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDNN_BACKEND_TENSOR_DESCRIPTOR cudnnFinalize failed cudnn_status: CUDNN_STATUS_NOT_INITIALIZED`

@nWEIdia We will need to resolve this before merging the change.

Thanks @tinglvv ! Could you please briefly describe the reproducer steps? I do know currently it has not been successful in building v9 based wheel, did you manually build a v9 based arm cuda wheel?

Yes the wheel is built ok, just need to edit the cmake rules that @eqy mentioned in slack so that it recognizes cudnnv9 I think.

The required cmake changes seem to be from pytorch/pytorch#123475, which is failing cuda-aarch64 (due to this change #1847 missing). Due to this inter-dependency, it might be okay to ignore the cuda-aarch64 failures and merge the pytorch/pytorch change, then merge this change to fix the cuda-aarch64 failure.

Reproducer steps:

Step 1 Build image: GPU_ARCH_TYPE=cuda-aarch64 GPU_ARCH_VERSION=12.4 manywheel/build_docker.sh
Step 2 Run image: docker run --gpus all -it pytorch/manylinuxaarch64-builder:(replace with the generated docker from step 1)
Step 3 Git clone the pytorch repo to the docker image: cd / && git clone https://github.com/pytorch/pytorch.git
Step 4 Build the wheel: cd /builder/aarch64_linux && DESIRED_PYTHON=3.10 DESIRED_CUDA=12.4 ./aarch64_ci_build.sh

@atalman atalman merged commit 5783bcc into pytorch:main Jun 4, 2024
26 checks passed
atalman added a commit that referenced this pull request Jun 5, 2024
atalman added a commit that referenced this pull request Jun 17, 2024
* Remove triton constraint for py312 (#1846)

* Cache OpenBLAS to docker image for SBSA builds (#1842)

* apply openblas cache for cpu-aarch64

* reapply for cuda-aarch64

* [MacOS] Don't build wheel while building libtorch

Not sure why this was ever done twice

* Allow validate doker images to be called from different workflow (#1850)

* Allow validate doker images to be called from different workflow

* Revert "[MacOS] Don't build wheel while building libtorch"

This reverts commit d88495a.

* [MacOS] Don't build libtorch twice (take 2)

By not invoking `tools/build_libtorch.py` as as it's not done on Linux

* [MacOs][LibTorch] Copy libomp.dylib into libtorch package

* Update cudnn from v8 to v9 across CUDA versions and x86/arm (#1847)

* Update cudnn to v9.1.0.70 for cuda11.8, cuda12.1, and cuda12.4

* Add CUDNN_VERSION variable

* Remove 2 spaces for install_cu124

* trivial fix

* Fix DEPS_LIST and DEPS_SONAME for x86
Update cudnn to v9 for arm cuda binary as well

* libcudnn_adv_infer/libcudnn_adv_train becomes libcudnn_adv

* Change DEPS due to cudnn v9 libraries name changes (and additions)

* Fix lint

* Add missing changes to cu121/cu124

* Change OpenSSL URL (#1854)

* Change OpenSSL URL

* Change to use openssl URL (but no longer ftp!)

* Update build-manywheel-images.yml - Add a note about manylinux_2_28 state

* Revert "Update cudnn from v8 to v9 across CUDA versions and x86/arm" (#1855)

This reverts commit 5783bcc.

* Don't run torch.compile on runtime images in docker validations (#1858)

* Don't run torch.compile on runtime images

* test

* Don't run torch.compile on runtime images in docker validations

* Update cudnn from v8 to v9 across CUDA versions and x86/arm (#1857)

* Update cudnn to v9.1.0.70 for cuda11.8, cuda12.1, and cuda12.4

* Add CUDNN_VERSION variable

* Remove 2 spaces for install_cu124

* trivial fix

* Fix DEPS_LIST and DEPS_SONAME for x86
Update cudnn to v9 for arm cuda binary as well

* libcudnn_adv_infer/libcudnn_adv_train becomes libcudnn_adv

* Change DEPS due to cudnn v9 libraries name changes (and additions)

* Fix lint

* Add missing changes to cu121/cu124

* Fix aarch64 cuda typos

* Update validate-docker-images.yml - disable runtime error check for now

* Update validate-docker-images.yml - use validation_runner rather then hardcoded one

* Update validate-docker-images.yml - fix MATRIX_GPU_ARCH_TYPE setting for cpu only workflows

* [aarch64 cuda cudnn] Add RUNPATH to libcudnn_graph.so.9 (#1859)

* Add executorch to pypi prep, promotion and validation scripts (#1860)

* Add AOTriton install step for ROCm manylinux images (#1862)

* Add AOTriton install step for ROCm

* No common_utils.sh needed

* temporary disable runtime error check

* Add python 3.13 builder (#1845)

---------

Co-authored-by: Ting Lu <92425201+tinglvv@users.noreply.github.com>
Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
Co-authored-by: Wei Wang <143543872+nWEIdia@users.noreply.github.com>
Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
PaliC pushed a commit that referenced this pull request Jun 18, 2024
* Update cudnn to v9.1.0.70 for cuda11.8, cuda12.1, and cuda12.4

* Add CUDNN_VERSION variable

* Remove 2 spaces for install_cu124

* trivial fix

* Fix DEPS_LIST and DEPS_SONAME for x86
Update cudnn to v9 for arm cuda binary as well

* libcudnn_adv_infer/libcudnn_adv_train becomes libcudnn_adv

* Change DEPS due to cudnn v9 libraries name changes (and additions)

* Fix lint

* Add missing changes to cu121/cu124
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants