From 1a6e265eb6c44309f118f22bba69b38d92b37352 Mon Sep 17 00:00:00 2001 From: iefgnoix Date: Mon, 20 Nov 2023 16:49:31 -0800 Subject: [PATCH] Use cuda 12.1 and build pytorch with cuda enabled. (#5831) * Use cuda 12.1 * update link --- README.md | 11 ++++++----- docs/gpu.md | 26 +++++++++++++++++++------- 2 files changed, 25 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 68a67e96c82..bcc989727b6 100644 --- a/README.md +++ b/README.md @@ -147,10 +147,11 @@ bucket. | Version | Cloud TPU VMs Wheel | | --- | ----------- | -| 2.1 (CUDA 12.0 + Python 3.8) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.0/torch_xla-2.1.0-cp38-cp38-manylinux_2_28_x86_64.whl` | +| 2.1 (CUDA 12.1 + Python 3.8) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.1.0-cp38-cp38-manylinux_2_28_x86_64.whl` | | 2.1 (XRT + Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/xrt/tpuvm/torch_xla-2.1.0%2Bxrt-cp310-cp310-manylinux_2_28_x86_64.whl` | | nightly (Python 3.8) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-nightly-cp38-cp38-linux_x86_64.whl` | | nightly (Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-nightly-cp310-cp310-linux_x86_64.whl` | +| nightly (CUDA 12.1 + Python 3.8) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-nightly-cp38-cp38-linux_x86_64.whl` |
@@ -230,11 +231,11 @@ This is only required on Cloud TPU VMs.
-| Version | GPU CUDA 12.0 Docker | +| Version | GPU CUDA 12.1 Docker | | --- | ----------- | -| 2.1 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_cuda_12.0` | -| nightly | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.0` | -| nightly at date | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.0_YYYYMMDD` | +| 2.1 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_cuda_12.1` | +| nightly | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.1` | +| nightly at date | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.1_YYYYMMDD` |
diff --git a/docs/gpu.md b/docs/gpu.md index 02785ce7470..011bff8f806 100644 --- a/docs/gpu.md +++ b/docs/gpu.md @@ -11,14 +11,14 @@ You can either use a local machine with GPU attached or a GPU VM on the cloud. F ### Docker Pytorch/XLA currently publish prebuilt docker images and wheels with cuda11.7/8 and python 3.8. We recommend users to create a docker container with corresponding config. For a full list of docker images and wheels, please refer to [this doc](https://github.com/pytorch/xla#available-docker-images-and-wheels). ``` -sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.8 +sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.1 sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker -sudo docker run --shm-size=16g --net=host --gpus all -it -d us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.8 bin/bash +sudo docker run --shm-size=16g --net=host --gpus all -it -d us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.1 bin/bash sudo docker exec -it $(sudo docker ps | awk 'NR==2 { print $1 }') /bin/bash ``` @@ -49,10 +49,20 @@ Thu Dec 8 06:24:29 2022 ``` +### Check environment variable + +Make sure `PATH` and `LD_LIBRARY_PATH` environment variables account for cuda. Please do a `echo $PATH` and `echo $LD_LIBRARY_PATH` to verify. If not, please follow [link](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#mandatory-actions) to do so. Example: + +``` +echo "export PATH=/usr/local/cuda-12.1/bin${PATH:+:${PATH}}" >> ~/.bashrc +echo "export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" >> ~/.bashrc +source ~/.bashrc +``` + ### Wheel ``` -pip3 install torch==2.0 -pip3 install https://storage.googleapis.com/tpu-pytorch/wheels/cuda/117/torch_xla-2.0-cp38-cp38-linux_x86_64.whl +pip3 install torch==2.1 +pip3 install https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.1.0-cp38-cp38-manylinux_2_28_x86_64.whl ``` ## Run a simple model @@ -80,23 +90,25 @@ AMP is very useful on GPU training and PyTorch/XLA reuse Cuda's AMP rule. You ca 1. Inside a GPU VM, create a docker container from a development docker image. For example: ``` -sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_11.8 +sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_12.1 sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker -sudo docker run --shm-size=16g --net=host --gpus all -it -d us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_11.8 +sudo docker run --shm-size=16g --net=host --gpus all -it -d us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_12.1 sudo docker exec -it $(sudo docker ps | awk 'NR==2 { print $1 }') /bin/bash ``` 2. Build PyTorch and PyTorch/XLA from source. +Make sure `PATH` and `LD_LIBRARY_PATH` environment variables account for cuda. See the [above](https://github.com/pytorch/xla/blob/master/docs/gpu.md#check-environment-variable) for more info. + ``` git clone https://github.com/pytorch/pytorch.git cd pytorch -USE_CUDA=0 python setup.py install +USE_CUDA=1 python setup.py install git clone https://github.com/pytorch/xla.git cd xla