Run NVIDIA GPU tests on A10 per PR (#1544)

Summary: ## Problem The OSS side does not run any NVIDIA-GPU unit test, so we cannot catch any OSS-specific issues before merging it; we run these NVIDIA-GPU tests only when a nightly build is pushed to PyPI. This sometimes breaks our nightly build; our customers get unhappy. ## Solution This PR adds a new GitHub Action job that runs unit tests on an AWS NVIDIA A10 machine "per PR". This job creates a wheel file and tests it as the nightly job does, so it should be able to detect some wheel-related issues too (e.g., a build procedure was updated but the nightly script is not updated) though it is not comprehensive (see below). Note: - This job shouldn't lengthen the CI time. This job takes 1-2 hours, which i s shorter than other CUDA-build jobs using GHA-native runners. - This job covers only Python 3.10 + CUDA 11.7 + A10 + OS that our Docker script uses. We might want different combinations (e.g., different Python/CUDA/GCC versions and/or Volta GPU) in the future, but anyway this addition should be better than nothing. - If you need a thorough nightly/release script check, please add a label (e.g., `test_wheel_nightly`) to your PR, which runs the real wheel-creation scripts with `upload_pypi` disabled. Pull Request resolved: #1544 Reviewed By: jianyuh Differential Revision: D42490935 Pulled By: shintaro-iwasaki fbshipit-source-id: 974050736a9381d2329117fbf2855b03e438345a
pytorch · Jan 14, 2023 · 3848313 · 3848313
1 parent dc328f2
commit 3848313
Showing 1 changed file with 23 additions and 0 deletions.
diff --git a/.github/workflows/fbgemmci.yml b/.github/workflows/fbgemmci.yml
@@ -382,6 +382,29 @@ jobs:
         "
         docker run $DOCKER_OPTIONS $DOCKER_IMAGE $JENKINS_REPO_DIR_DOCKER/.jenkins/rocm/build_and_test.sh $JENKINS_REPO_DIR_DOCKER
 
+  test_nvidia_gpu:
+    uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
+    with:
+      job-name: cuda 11.7, A10
+      runner: linux.g5.4xlarge.nvidia.gpu # A10
+      repository: pytorch/fbgemm
+      gpu-arch-type: cuda
+      gpu-arch-version: 11.7
+      timeout: 150
+      script: |
+        set -x
+        # Checkout FBGEMM_GPU
+        git submodule update --init
+
+        # Build FBGEMM_GPU with pytorch-nightly
+        PYTORCH_CUDA_VERSION="11.7"
+        PYTHON_VERSION="3.10"
+        bash .github/scripts/build_wheel.bash -v -p "$PYTHON_VERSION" -o fbgemm_gpu_test -P pytorch-nightly -c "$PYTORCH_CUDA_VERSION" -m /opt/conda
+
+        # Test FBGEMM_GPU using a generated wheel file
+        WHEEL_PATH="$(ls fbgemm_gpu/dist/*.whl)"
+        bash .github/scripts/test_wheel.bash -v -p "$PYTHON_VERSION" -P pytorch-nightly -c "$PYTORCH_CUDA_VERSION" -w "$WHEEL_PATH" -m /opt/conda
+
   build_cpu_only:
     runs-on: ${{ matrix.os }}
     strategy: