Skip to content

Commit

Permalink
Run NVIDIA GPU tests on A10 per PR (#1544)
Browse files Browse the repository at this point in the history
Summary:
## Problem

The OSS side does not run any NVIDIA-GPU unit test, so we cannot catch any OSS-specific issues before merging it; we run these NVIDIA-GPU tests only when a nightly build is pushed to PyPI. This sometimes breaks our nightly build; our customers get unhappy.

## Solution

This PR adds a new GitHub Action job that runs unit tests on an AWS NVIDIA A10 machine "per PR". This job creates a wheel file and tests it as the nightly job does, so it should be able to detect some wheel-related issues too (e.g., a build procedure was updated but the nightly script is not updated) though it is not comprehensive (see below).

Note:
- This job shouldn't lengthen the CI time. This job takes 1-2 hours, which i s shorter than other CUDA-build jobs using GHA-native runners.
- This job covers only Python 3.10 + CUDA 11.7 + A10 + OS that our Docker script uses. We might want different combinations (e.g., different Python/CUDA/GCC versions and/or Volta GPU) in the future, but anyway this addition should be better than nothing.
- If you need a thorough nightly/release script check, please add a label (e.g., `test_wheel_nightly`) to your PR, which runs the real wheel-creation scripts with `upload_pypi` disabled.

Pull Request resolved: #1544

Reviewed By: jianyuh

Differential Revision: D42490935

Pulled By: shintaro-iwasaki

fbshipit-source-id: 974050736a9381d2329117fbf2855b03e438345a
  • Loading branch information
shintaro-iwasaki authored and facebook-github-bot committed Jan 14, 2023
1 parent dc328f2 commit 3848313
Showing 1 changed file with 23 additions and 0 deletions.
23 changes: 23 additions & 0 deletions .github/workflows/fbgemmci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -382,6 +382,29 @@ jobs:
"
docker run $DOCKER_OPTIONS $DOCKER_IMAGE $JENKINS_REPO_DIR_DOCKER/.jenkins/rocm/build_and_test.sh $JENKINS_REPO_DIR_DOCKER
test_nvidia_gpu:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
with:
job-name: cuda 11.7, A10
runner: linux.g5.4xlarge.nvidia.gpu # A10
repository: pytorch/fbgemm
gpu-arch-type: cuda
gpu-arch-version: 11.7
timeout: 150
script: |
set -x
# Checkout FBGEMM_GPU
git submodule update --init
# Build FBGEMM_GPU with pytorch-nightly
PYTORCH_CUDA_VERSION="11.7"
PYTHON_VERSION="3.10"
bash .github/scripts/build_wheel.bash -v -p "$PYTHON_VERSION" -o fbgemm_gpu_test -P pytorch-nightly -c "$PYTORCH_CUDA_VERSION" -m /opt/conda
# Test FBGEMM_GPU using a generated wheel file
WHEEL_PATH="$(ls fbgemm_gpu/dist/*.whl)"
bash .github/scripts/test_wheel.bash -v -p "$PYTHON_VERSION" -P pytorch-nightly -c "$PYTORCH_CUDA_VERSION" -w "$WHEEL_PATH" -m /opt/conda
build_cpu_only:
runs-on: ${{ matrix.os }}
strategy:
Expand Down

0 comments on commit 3848313

Please sign in to comment.