Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace 11.1 with 11.2 on CI for Windows #51598

Closed
wants to merge 3 commits into from

Conversation

janeyx99
Copy link
Contributor

@janeyx99 janeyx99 commented Feb 2, 2021

Adding CUDA 11.2 to Windows CI.

Disabled tests:

The following ran into CUDA error: misaligned address for CUDA 11.2: (issue linked below)
test_where_scalar_valid_combination_cuda_complex128 in test_torch.py
test_sgn_complex_cuda in test_autograd.py

The following ran into CUDA error: too many resources requested for launch for CUDA 11.2: (#52002)
test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_float64
test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_float64

The following ran into test assertion failures in test_optim.py (#51992)
test_adadelta
test_adam
test_adamw
test_multi_tensor_optimizers
test_rmsprop

@janeyx99 janeyx99 requested review from peterjc123 and mszhanyi and removed request for peterjc123 February 2, 2021 23:53
@janeyx99 janeyx99 added ci/all and removed cla signed labels Feb 2, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Feb 2, 2021

💊 CI failures summary and remediations

As of commit a62140a (more details on the Dr. CI page):


  • 6/6 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build binary_windows_wheel_3_9_cu102_nightly_build (1/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

echo CUDA 10.2 installed failed.

C:\w\b>set "PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\libnvvp;C:\Program Files (x86)\Windows Application Driver;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Current\Bin;C:\Program Files (x86)\Microsoft Visual Studio\Installer\;C:\tools\ruby26;C:\tools\ruby26\bin;C:\ProgramData\nvm;C:\tools\miniconda3;C:\tools\miniconda3\Library\mingw-w64\bin;C:\tools\miniconda3\Library\usr\bin;C:\tools\miniconda3\Library\bin;C:\tools\miniconda3\Scripts;C:\miniconda3\miniconda3\condabin;C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\ProgramData\GooGet;C:\Program Files\Google\Compute Engine\metadata_scripts;C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin;C:\Program Files\PowerShell\7\;C:\Program Files\Google\Compute Engine\sysprep;C:\Program Files\Docker;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Git\mingw64\bin;C:\Program Files\Git\usr\bin;C:\Program Files\Git LFS;C:\Program Files\Amazon\AWSCLI\bin\;C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code;C:\Program Files\Microsoft SDKs\Service Fabric\Tools\ServiceFabricLocalClusterManager;C:\Program Files (x86)\vim\vim80;C:\Go\bin;C:\Program Files\OpenJDK\jdk-12.0.2\bin;C:\ProgramData\nvm;C:\Program Files\nodejs;C:\Program Files\dotnet\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\;C:\Program Files (x86)\IncrediBuild;C:\Users\circleci\AppData\Local\Microsoft\WindowsApps" 

C:\w\b>set "CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2" 

C:\w\b>set "CUDA_PATH_V10_2=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2" 

C:\w\b>set "NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt" 

C:\w\b>if not exist "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\nvcc.exe" (
echo CUDA 10.2 installed failed.  
 exit /b 1 
) 

C:\w\b>echo Installing cuDNN... 
Installing cuDNN...

C:\w\b>7z x C:\w\b\windows\internal\\..\temp_build\cudnn-10.2-windows10-x64-v7.6.5.32.zip -o"C:\w\b\windows\internal\\..\temp_build\cudnn" 

7-Zip 19.00 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2019-02-21

See CircleCI build pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test (2/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Feb 10 20:49:35 [E request_callback_no_python.cpp:653] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Feb 10 20:49:35 At:
Feb 10 20:49:35   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize
Feb 10 20:49:35   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize
Feb 10 20:49:35 
Feb 10 20:49:35 [E request_callback_no_python.cpp:653] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Feb 10 20:49:35 
Feb 10 20:49:35 At:
Feb 10 20:49:35   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize
Feb 10 20:49:35   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize
Feb 10 20:49:35 
Feb 10 20:49:35 [E request_callback_no_python.cpp:653] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Feb 10 20:49:35 
Feb 10 20:49:35 At:
Feb 10 20:49:35   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize
Feb 10 20:49:35   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize
Feb 10 20:49:35 
Feb 10 20:49:35 ok (1.532s)
Feb 10 20:49:37   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... ok (1.632s)
Feb 10 20:49:38   test_return_local_rrefs (__main__.TensorPipeRpcTestWithSpawn) ... ok (1.633s)
Feb 10 20:49:45   test_rpc_profiling_async_function (__main__.TensorPipeRpcTestWithSpawn) ... ok (6.140s)
Feb 10 20:49:50   test_rpc_profiling_async_function_single_threaded (__main__.TensorPipeRpcTestWithSpawn) ... ok (5.740s)

See CircleCI build binary_windows_libtorch_3_7_cu102_release_nightly_build (3/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

FAILED: third_party/ideep/mkl-dnn/src/cpu/x64/CMakeFiles/dnnl_cpu_x64.dir/jit_uni_x8s8s32x_conv_kernel.cpp.obj
caused by: Failed to read response header
caused by: failed to fill whole buffer
[2388/4735] Building CXX object third_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\jit_utils\jit_utils.cpp.obj
FAILED: third_party/ideep/mkl-dnn/src/cpu/x64/CMakeFiles/dnnl_cpu_x64.dir/jit_utils/jit_utils.cpp.obj 
C:\w\b\windows\tmp_bin\sccache-cl.exe  /nologo /TP -DDNNL_ENABLE_CONCURRENT_EXEC -DDNNL_ENABLE_MAX_CPU_ISA -DDNNL_X64=1 -DIDEEP_USE_MKL -DMAGMA_V2 -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_OPENMP_NOFORCE_MANIFEST -D_WIN -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -I..\..\cmake\..\third_party\benchmark\include -Icaffe2\contrib\aten -I..\..\third_party\onnx -Ithird_party\onnx -I..\..\third_party\foxi -Ithird_party\foxi -I..\..\third_party\ideep\mkl-dnn\include -Ithird_party\ideep\mkl-dnn\include -I..\..\third_party\ideep\mkl-dnn\src -Ithird_party\gloo -I..\..\cmake\..\third_party\gloo -I..\..\cmake\..\third_party\googletest\googlemock\include -I..\..\cmake\..\third_party\googletest\googletest\include -I..\..\third_party\protobuf\src -IC:\w\b\windows\mkl\include -I..\..\third_party\XNNPACK\include -I..\..\third_party -I..\..\cmake\..\third_party\eigen -I..\..\cmake\..\third_party\pybind11\include -I..\..\cmake\..\third_party\cub -IC:\w\b\windows\magma_cuda102_release\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\jit_utils\jit_utils.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\ /FS -c ..\..\third_party\ideep\mkl-dnn\src\cpu\x64\jit_utils\jit_utils.cpp
error: failed to execute compile
caused by: error reading compile response from server
caused by: Failed to read response header
caused by: failed to fill whole buffer
[2389/4735] Building CXX object third_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\jit_uni_x8s8s32x_conv_kernel.cpp.obj
FAILED: third_party/ideep/mkl-dnn/src/cpu/x64/CMakeFiles/dnnl_cpu_x64.dir/jit_uni_x8s8s32x_conv_kernel.cpp.obj 
C:\w\b\windows\tmp_bin\sccache-cl.exe  /nologo /TP -DDNNL_ENABLE_CONCURRENT_EXEC -DDNNL_ENABLE_MAX_CPU_ISA -DDNNL_X64=1 -DIDEEP_USE_MKL -DMAGMA_V2 -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_OPENMP_NOFORCE_MANIFEST -D_WIN -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -I..\..\cmake\..\third_party\benchmark\include -Icaffe2\contrib\aten -I..\..\third_party\onnx -Ithird_party\onnx -I..\..\third_party\foxi -Ithird_party\foxi -I..\..\third_party\ideep\mkl-dnn\include -Ithird_party\ideep\mkl-dnn\include -I..\..\third_party\ideep\mkl-dnn\src -Ithird_party\gloo -I..\..\cmake\..\third_party\gloo -I..\..\cmake\..\third_party\googletest\googlemock\include -I..\..\cmake\..\third_party\googletest\googletest\include -I..\..\third_party\protobuf\src -IC:\w\b\windows\mkl\include -I..\..\third_party\XNNPACK\include -I..\..\third_party -I..\..\cmake\..\third_party\eigen -I..\..\cmake\..\third_party\pybind11\include -I..\..\cmake\..\third_party\cub -IC:\w\b\windows\magma_cuda102_release\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\jit_uni_x8s8s32x_conv_kernel.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\ /FS -c ..\..\third_party\ideep\mkl-dnn\src\cpu\x64\jit_uni_x8s8s32x_conv_kernel.cpp
error: failed to execute compile
caused by: error reading compile response from server
caused by: Failed to read response header
caused by: failed to fill whole buffer
[2390/4735] Building CXX object third_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\jit_utils\linux_perf\linux_perf.cpp.obj
ninja: build stopped: subcommand failed.
cmake -GNinja -DBUILD_ENVIRONMENT=libtorch 3.7 cu102 release -DBUILD_FOR_SYSTEM=windows -DBUILD_JNI=ON -DBUILD_PYTHON=False -DBUILD_PYTHONLESS=1 -DBUILD_TEST=True -DBUILD_TYPE=release -DCMAKE_BUILD_TYPE=Release -DCMAKE_GENERATOR=Ninja -DCMAKE_INCLUDE_PATH=C:\w\b\windows\mkl\include -DCMAKE_INSTALL_PREFIX=C:\w\b\windows\pytorch\torch -DCMAKE_PREFIX_PATH=C:\w\b\windows\conda\envs\py37\Lib\site-packages -DCUDA_NVCC_EXECUTABLE=C:\w\b\windows\tmp_bin\randomtemp.exe -DINSTALL_TEST=0 -DJAVA_HOME=C:/Users/circleci/project/.circleci/windows-jni/ -DNUMPY_INCLUDE_DIR=C:\w\b\windows\conda\envs\py37\lib\site-packages\numpy\core\include -DPYTHON_EXECUTABLE=C:\w\b\windows\conda\envs\py37\python.exe -DPYTHON_INCLUDE_DIR=C:\w\b\windows\conda\envs\py37\include -DUSE_FBGEMM=1 -DUSE_NUMPY=True -DUSE_SCCACHE=1 C:\w\b\windows\pytorch
cmake --build . --target install --config Release -- -j 16
Traceback (most recent call last):

3 failures not recognized by patterns:

Job Step Action
CircleCI binary_windows_wheel_3_9_cpu_nightly_build Build 🔁 rerun
CircleCI binary_windows_wheel_3_9_cu112_nightly_build Build 🔁 rerun
CircleCI binary_windows_wheel_3_9_cu101_nightly_build Build 🔁 rerun

1 job timed out:

  • pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@janeyx99 janeyx99 requested review from peterjc123 and a team February 2, 2021 23:53
@janeyx99 janeyx99 mentioned this pull request Feb 2, 2021
17 tasks
@janeyx99 janeyx99 force-pushed the ci-all/replace-windows-11.1-ci-with-11.2 branch from 8d4498c to 74a76ee Compare February 3, 2021 05:33
Copy link
Collaborator

@peterjc123 peterjc123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failure looks legit and doesn't seem to be Windows-only.

@janeyx99
Copy link
Contributor Author

janeyx99 commented Feb 3, 2021

The test failure looks legit and doesn't seem to be Windows-only.

The windows tests errors don't appear in commits on master, though.

@janeyx99 janeyx99 force-pushed the ci-all/replace-windows-11.1-ci-with-11.2 branch from 74a76ee to db380bc Compare February 4, 2021 16:12
@janeyx99 janeyx99 added ci/all and removed ci/all labels Feb 4, 2021
@mszhanyi
Copy link
Collaborator

mszhanyi commented Feb 8, 2021

@janeyx99 magma for cuda11.2 size is smaller, Is it right?
image

@peterjc123
Copy link
Collaborator

@mszhanyi The magma binaries are built in this PR: #42408. Looking into the build log, I didn't find anything wrong there.

@janeyx99
Copy link
Contributor Author

janeyx99 commented Feb 8, 2021

@janeyx99 magma for cuda11.2 size is smaller, Is it right?
image

Yes, I believe CUDA 11.2 is smaller than CUDA 11.1.

@janeyx99 janeyx99 force-pushed the ci-all/replace-windows-11.1-ci-with-11.2 branch from db380bc to 357d627 Compare February 8, 2021 17:59
@janeyx99
Copy link
Contributor Author

janeyx99 commented Feb 8, 2021

After rebasing and rerunning, I don't think the CUDA error: misaligned address is separate from CUDA 11.2. @zasdfgbnm @peterjc123 any thoughts?

@zasdfgbnm
Copy link
Collaborator

zasdfgbnm commented Feb 8, 2021

@janeyx99 I have never seen this failure. Let me try to reproduce locally. cc: @ptrblck

@zasdfgbnm
Copy link
Collaborator

I reproduced the failure of test_where_scalar_valid_combination_cuda_complex128 on Windows, but it is not failing on my Linux system.

@zasdfgbnm
Copy link
Collaborator

I opened an issue at #51980 for the failing test to track it separately.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@janeyx99 janeyx99 force-pushed the ci-all/replace-windows-11.1-ci-with-11.2 branch from 357d627 to 6adc4e0 Compare February 9, 2021 20:00
@janeyx99 janeyx99 force-pushed the ci-all/replace-windows-11.1-ci-with-11.2 branch from becef8d to 8f88054 Compare February 9, 2021 22:56
@@ -12685,6 +12685,7 @@ def test_per_sample_weights(mode, trainable_scale):
for mode, trainable in itertools.product(modes, trainable_scale):
test_per_sample_weights(mode, trainable)

@unittest.skipIf(IS_WINDOWS, "FIXME: CUDA error: too many resources requested on 11.2")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also have a issue for this one, and reference the issue number here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#52002 yes I made one here (linked in description)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Mar 2, 2021
Summary:
The previous code allowed these tests to run every four hours on certain ci-all branches...which is really bad and resource intensive. This code removes that, but then disallows the 11.2 and 9.2 tests to be run on ci-all branches.

To debug CUDA 11.2 or 9.2 tests, one must now manually change the config to allow for them. (Look at #51888 and #51598 for examples of how to do that.)

Pull Request resolved: #53069

Reviewed By: H-Huang

Differential Revision: D26739738

Pulled By: janeyx99

fbshipit-source-id: 7577b9b2e876bac0e4e868ce2a1f3ffdb6aca597
aocsa pushed a commit to Quansight/pytorch that referenced this pull request Mar 15, 2021
…3069)

Summary:
The previous code allowed these tests to run every four hours on certain ci-all branches...which is really bad and resource intensive. This code removes that, but then disallows the 11.2 and 9.2 tests to be run on ci-all branches.

To debug CUDA 11.2 or 9.2 tests, one must now manually change the config to allow for them. (Look at pytorch#51888 and pytorch#51598 for examples of how to do that.)

Pull Request resolved: pytorch#53069

Reviewed By: H-Huang

Differential Revision: D26739738

Pulled By: janeyx99

fbshipit-source-id: 7577b9b2e876bac0e4e868ce2a1f3ffdb6aca597
@mszhanyi
Copy link
Collaborator

@zasdfgbnm
I'm trying to build pytorch with cuda11.2 on windows.
I came across some errors like

identifier "__floorf" is undefined in device code, calling a __host__ function("__floorf") from a __device__ function xxx is not allowed.
identifier "__ceilf" is undefined in device code. ....

Did you come across them? Is it specific in cuda11.2

@peterjc123
Copy link
Collaborator

peterjc123 commented Mar 15, 2021

@mszhanyi No, it's caused by the VS upgrade, namely this commit microsoft/STL#1336 in STL. They forget to make the fallback to C_STD function for functions like floorf and ceilf in CUDA device code.

xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
Summary:
Adding CUDA 11.2 to Windows CI.

Disabled tests:

The following ran into `CUDA error: misaligned address` for CUDA 11.2: (issue linked below)
`test_where_scalar_valid_combination_cuda_complex128` in test_torch.py
`test_sgn_complex_cuda` in test_autograd.py

The following ran into `CUDA error: too many resources requested for launch` for CUDA 11.2: (pytorch#52002)
test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_float64
test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_float64

Pull Request resolved: pytorch#51598

Reviewed By: mrshenli

Differential Revision: D26344965

Pulled By: janeyx99

fbshipit-source-id: 3c9a4ed16d748969e96593220ec0a9f33e1ffcef
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
…3069)

Summary:
The previous code allowed these tests to run every four hours on certain ci-all branches...which is really bad and resource intensive. This code removes that, but then disallows the 11.2 and 9.2 tests to be run on ci-all branches.

To debug CUDA 11.2 or 9.2 tests, one must now manually change the config to allow for them. (Look at pytorch#51888 and pytorch#51598 for examples of how to do that.)

Pull Request resolved: pytorch#53069

Reviewed By: H-Huang

Differential Revision: D26739738

Pulled By: janeyx99

fbshipit-source-id: 7577b9b2e876bac0e4e868ce2a1f3ffdb6aca597
@github-actions github-actions bot deleted the ci-all/replace-windows-11.1-ci-with-11.2 branch February 10, 2024 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants