Skip to content

Commit

Permalink
Fix DeformConvTester::test_backward_cuda
Browse files Browse the repository at this point in the history
Summary:
`test_backward_cuda_contiguous` and `test_backward_cuda_non_contiguous` have been failing on fbcode for a while with the following error `too many resources requested for launch` which suggests that too may threads per block are requested.

This issue was already causing problems in the original PR pytorch#2791 (comment), where the author decided that CC >= 6 was a good threshold because with CC >= 6 GPUs have more registers. (CC = Compute Capability)

However, I'm not certain that this is actually true: if we look at https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications, it's clear that 6.2 has less registers per thread block than 6.0. So I'm not sure this threshold completely makes sense.

Moreover, let's note that that the current tests (as on `master`):

- **pass** on OSS linux CI which rely on a P4 GPU (up to last week), i.e. **CC = 6.1**
- **pass** on OSS windows CI which relies on a T4 GPU, i.e. **CC = 7.5**
- **fail** on the AWS cluster which relies on a V100 GPU, i.e. **CC = 7.0**

It is quite unclear to me what kind of resource is "enough" for the tests to pass on both 6.1 and 7.5 but not on 7.0. As a result, I think it's safer to just reduce the number of threads per block, irrespective of the CC.

ngimel,  fmassa suggested that I tag you here since you could have some valuable insight for us. Thanks!

Reviewed By: fmassa

Differential Revision: D28641626

fbshipit-source-id: 2618c366c5d18bbb7ebafc33032e7ac6c0404d0b
  • Loading branch information
NicolasHug committed Jun 1, 2021
1 parent 2cc8359 commit 6003832
Showing 1 changed file with 1 addition and 4 deletions.
5 changes: 1 addition & 4 deletions torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu
Original file line number Diff line number Diff line change
Expand Up @@ -85,10 +85,7 @@ inline unsigned int GET_THREADS() {
#ifdef __HIP_PLATFORM_HCC__
return 256;
#endif
if (at::cuda::getCurrentDeviceProperties()->major >= 6) {
return 1024;
}
return 512;
return 512;
}

inline unsigned int GET_BLOCKS(
Expand Down

0 comments on commit 6003832

Please sign in to comment.