Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix DeformConvTester::test_backward_cuda
Summary: `test_backward_cuda_contiguous` and `test_backward_cuda_non_contiguous` have been failing on fbcode for a while with the following error `too many resources requested for launch` which suggests that too may threads per block are requested. This issue was already causing problems in the original PR pytorch#2791 (comment), where the author decided that CC >= 6 was a good threshold because with CC >= 6 GPUs have more registers. (CC = Compute Capability) However, I'm not certain that this is actually true: if we look at https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications, it's clear that 6.2 has less registers per thread block than 6.0. So I'm not sure this threshold completely makes sense. Moreover, let's note that that the current tests (as on `master`): - **pass** on OSS linux CI which rely on a P4 GPU (up to last week), i.e. **CC = 6.1** - **pass** on OSS windows CI which relies on a T4 GPU, i.e. **CC = 7.5** - **fail** on the AWS cluster which relies on a V100 GPU, i.e. **CC = 7.0** It is quite unclear to me what kind of resource is "enough" for the tests to pass on both 6.1 and 7.5 but not on 7.0. As a result, I think it's safer to just reduce the number of threads per block, irrespective of the CC. ngimel, fmassa suggested that I tag you here since you could have some valuable insight for us. Thanks! Reviewed By: fmassa Differential Revision: D28641626 fbshipit-source-id: 2618c366c5d18bbb7ebafc33032e7ac6c0404d0b
- Loading branch information