Windows CI CUDA Intermittent error C2993 #17935

ChaiBapchya · 2020-03-30T02:26:31Z

Description

Intermittent failure seen on windows-gpu compilation phase (WIN_GPU/WIN_GPU_MKLDNN)

Discovered in this PR : #17808

Error Message

It intermittently gives the error :

C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2993: 'T': illegal type for non-type template parameter '__formal

Errors:

[2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2993: 'T': illegal type for non-type template parameter '__formal'
[2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): note: see reference to class template instantiation 'thrust::detail::allocator_traits_detail::has_value_type<T>' being compiled
[2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2065: 'U1': undeclared identifier
[2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2923: 'std::_Select<__formal>::_Apply': 'U1' is not a valid template type argument for parameter '<unnamed-symbol>'
[2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
[2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2144: syntax error: 'unknown-type' should be preceded by ')'
[2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2144: syntax error: 'unknown-type' should be preceded by ';'
[2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2238: unexpected token(s) preceding ';'
[2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2059: syntax error: ')'
[2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2988: unrecognizable template declaration/definition
[2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2059: syntax error: '<end Parse>'

Entire stack trace:
http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/windows-gpu/branches/PR-17808/runs/15/nodes/39/log/?start=0

To Reproduce

Build using Windows AMI and run
Clone repo &
py -3 ci/build_windows.py -f WIN_GPU

What have you tried to solve it?

Use cuda 10.2 instead of 9.2
Updated VS2019
Add cmake flag : /Zc:__cplusplus

Currently, what is found to work:
Introduced max retries = 5

The text was updated successfully, but these errors were encountered:

ChaiBapchya · 2020-03-30T02:32:42Z

@mxnet-label-bot add [ci, windows]

leezu · 2020-04-04T00:11:50Z

Created an upstream issue: NVIDIA/thrust#1090

leezu · 2020-05-01T04:37:01Z

@vexilligera did you test if the error also occurs on more recent versions of thrust? I suggest we try installing thrust 1.9.8 version on Windows CI, which is the version that'll be shipped with Cuda 11

We do that on Ubuntu CI already

https://github.com/apache/incubator-mxnet/blob/76fa58373636c57fee1e4e6cd7960723b39f455f/ci/docker/Dockerfile.build.ubuntu#L144-L150

leezu · 2020-05-01T17:51:14Z

There is another suggested fix at pytorch/pytorch#25393 (comment)

cc @vexilligera

leezu · 2020-05-09T03:53:03Z

Seems to be a nvcc bug NVIDIA/thrust#1090 (comment)

alliepiper · 2020-05-11T17:48:58Z

This is indeed an nvcc bug. There is no known workaround at the moment, but the next release of the CUDA toolkit will contain a fix.

Ref NVIDIA/thrust#1090.

ChaiBapchya added the Bug label Mar 30, 2020

ChaiBapchya mentioned this issue Mar 30, 2020

[WIP] Windows dev environment configuration, update install instructions from source in the docs #17808

Closed

7 tasks

lanking520 added CI Windows labels Mar 30, 2020

ChaiBapchya changed the title ~~Windows CI CUDA Intermitted error C2993~~ Windows CI CUDA Intermittent error C2993 Apr 2, 2020

leezu mentioned this issue Apr 4, 2020

Intermittent compilation failures with thrust, cuda 10.2 and MSVC 2019 NVIDIA/thrust#1090

Closed

This was referenced May 1, 2020

Update to thrust 1.9.8 on Windows #18218

Merged

Re-enable build retries on MSVC #18230

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows CI CUDA Intermittent error C2993 #17935

Windows CI CUDA Intermittent error C2993 #17935

ChaiBapchya commented Mar 30, 2020

ChaiBapchya commented Mar 30, 2020

leezu commented Apr 4, 2020

leezu commented May 1, 2020

leezu commented May 1, 2020

leezu commented May 9, 2020

alliepiper commented May 11, 2020

Windows CI CUDA Intermittent error C2993 #17935

Windows CI CUDA Intermittent error C2993 #17935

Comments

ChaiBapchya commented Mar 30, 2020

Description

Error Message

To Reproduce

What have you tried to solve it?

ChaiBapchya commented Mar 30, 2020

leezu commented Apr 4, 2020

leezu commented May 1, 2020

leezu commented May 1, 2020

leezu commented May 9, 2020

alliepiper commented May 11, 2020