Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DeformConv2d's autocast test flakiness #3060

Closed
datumbox opened this issue Nov 30, 2020 · 2 comments · Fixed by #3320
Closed

Fix DeformConv2d's autocast test flakiness #3060

datumbox opened this issue Nov 30, 2020 · 2 comments · Fixed by #3320

Comments

@datumbox
Copy link
Contributor

datumbox commented Nov 30, 2020

A series of recent unrelated PRs exposed the fact that DeformConv2d's autocast unit-test is flaky and it can fail if we hit a bad seed. To resolve the problem, PR #3032 introduced a temporary workaround that fixes the random_seed:

vision/test/test_ops.py

Lines 657 to 659 in 06ebee1

@unittest.skipIf(not torch.cuda.is_available(), "CUDA unavailable")
def test_autocast(self):
set_rng_seed(0)

The above is not a great solution. We should investigate and fix the problem properly by adjusting the unit-test.

To reproduce the issue, remove the set_rng_seed call from the test and rerun it.

@datumbox datumbox changed the title DeformConv2d's autocast test is flaky Fix DeformConv2d's autocast test flakiness Dec 1, 2020
@NicolasHug
Copy link
Member

I can't really tell why the test is failing for some seeds but I made some experiments:

  • It only fails for half precision; the float32 dtype sections are fine
  • increasing the tol from 1e-3 to 1.5e-3, seeds in [0, 79] pass (and seed 80 fails)
  • increasing tol further to 2e-3 makes all seeds in [0, 100] succeed

Considering that all checks for float32 pass without change, perhaps the simplest fix is to increase the tolerance to 2e-3 for half precision?

@datumbox
Copy link
Contributor Author

Thanks for the analysis @NicolasHug. Perhaps it's just an issue of using too small tolerance. I think 2e-3 is reasonable if it resolves the issue. I'm happy to review it if you want to send a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants