-
Notifications
You must be signed in to change notification settings - Fork 23k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace 11.1 with 11.2 on CI for Windows #51598
Conversation
💊 CI failures summary and remediationsAs of commit a62140a (more details on the Dr. CI page):
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages: binary_windows_wheel_3_9_cu102_nightly_build (1/3)Step: "Build" (full log | diagnosis details | 🔁 rerun)
|
Job | Step | Action |
---|---|---|
binary_windows_wheel_3_9_cpu_nightly_build | Build | 🔁 rerun |
binary_windows_wheel_3_9_cu112_nightly_build | Build | 🔁 rerun |
binary_windows_wheel_3_9_cu101_nightly_build | Build | 🔁 rerun |
1 job timed out:
pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test
This comment was automatically generated by Dr. CI (expand for details).
Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group.
8d4498c
to
74a76ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test failure looks legit and doesn't seem to be Windows-only.
The windows tests errors don't appear in commits on master, though. |
74a76ee
to
db380bc
Compare
@janeyx99 magma for cuda11.2 size is smaller, Is it right? |
Yes, I believe CUDA 11.2 is smaller than CUDA 11.1. |
db380bc
to
357d627
Compare
After rebasing and rerunning, I don't think the |
I reproduced the failure of |
I opened an issue at #51980 for the failing test to track it separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
357d627
to
6adc4e0
Compare
becef8d
to
8f88054
Compare
@@ -12685,6 +12685,7 @@ def test_per_sample_weights(mode, trainable_scale): | |||
for mode, trainable in itertools.product(modes, trainable_scale): | |||
test_per_sample_weights(mode, trainable) | |||
|
|||
@unittest.skipIf(IS_WINDOWS, "FIXME: CUDA error: too many resources requested on 11.2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also have a issue for this one, and reference the issue number here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#52002 yes I made one here (linked in description)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: The previous code allowed these tests to run every four hours on certain ci-all branches...which is really bad and resource intensive. This code removes that, but then disallows the 11.2 and 9.2 tests to be run on ci-all branches. To debug CUDA 11.2 or 9.2 tests, one must now manually change the config to allow for them. (Look at #51888 and #51598 for examples of how to do that.) Pull Request resolved: #53069 Reviewed By: H-Huang Differential Revision: D26739738 Pulled By: janeyx99 fbshipit-source-id: 7577b9b2e876bac0e4e868ce2a1f3ffdb6aca597
…3069) Summary: The previous code allowed these tests to run every four hours on certain ci-all branches...which is really bad and resource intensive. This code removes that, but then disallows the 11.2 and 9.2 tests to be run on ci-all branches. To debug CUDA 11.2 or 9.2 tests, one must now manually change the config to allow for them. (Look at pytorch#51888 and pytorch#51598 for examples of how to do that.) Pull Request resolved: pytorch#53069 Reviewed By: H-Huang Differential Revision: D26739738 Pulled By: janeyx99 fbshipit-source-id: 7577b9b2e876bac0e4e868ce2a1f3ffdb6aca597
@zasdfgbnm
Did you come across them? Is it specific in cuda11.2 |
@mszhanyi No, it's caused by the VS upgrade, namely this commit microsoft/STL#1336 in STL. They forget to make the fallback to C_STD function for functions like |
Summary: Adding CUDA 11.2 to Windows CI. Disabled tests: The following ran into `CUDA error: misaligned address` for CUDA 11.2: (issue linked below) `test_where_scalar_valid_combination_cuda_complex128` in test_torch.py `test_sgn_complex_cuda` in test_autograd.py The following ran into `CUDA error: too many resources requested for launch` for CUDA 11.2: (pytorch#52002) test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_float64 test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_float64 Pull Request resolved: pytorch#51598 Reviewed By: mrshenli Differential Revision: D26344965 Pulled By: janeyx99 fbshipit-source-id: 3c9a4ed16d748969e96593220ec0a9f33e1ffcef
…3069) Summary: The previous code allowed these tests to run every four hours on certain ci-all branches...which is really bad and resource intensive. This code removes that, but then disallows the 11.2 and 9.2 tests to be run on ci-all branches. To debug CUDA 11.2 or 9.2 tests, one must now manually change the config to allow for them. (Look at pytorch#51888 and pytorch#51598 for examples of how to do that.) Pull Request resolved: pytorch#53069 Reviewed By: H-Huang Differential Revision: D26739738 Pulled By: janeyx99 fbshipit-source-id: 7577b9b2e876bac0e4e868ce2a1f3ffdb6aca597
Adding CUDA 11.2 to Windows CI.
Disabled tests:
The following ran into
CUDA error: misaligned address
for CUDA 11.2: (issue linked below)test_where_scalar_valid_combination_cuda_complex128
in test_torch.pytest_sgn_complex_cuda
in test_autograd.pyThe following ran into
CUDA error: too many resources requested for launch
for CUDA 11.2: (#52002)test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_float64
test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_float64
The following ran into test assertion failures in test_optim.py (#51992)
test_adadelta
test_adam
test_adamw
test_multi_tensor_optimizers
test_rmsprop