Clear CUDA global error state after CUDA API calls #1020

brycelelbach · 2019-10-10T21:13:09Z

After making a CUDA API call, always clear the global CUDA error state by calling cudaGetLastError. Otherwise, if the CUDA API call is followed directly by a kernel launch, checking for a synchronous error during the kernel launch by calling cudaGetLastError may potentially return the error code from the CUDA API call. This type of error leakage is very subtle and difficult to trace.

This is a fix for Bug 2720132.

…e by calling cudaGetLastError. Otherwise, if the CUDA API call is followed directly by a kernel launch, checking for a synchronous error during the kernel launch by calling cudaGetLastError may potentially return the error code from the CUDA API call. This type of error leakage is very subtle and difficult to trace. Bug 2720132

griwes · 2019-10-13T00:52:42Z

So... isn't there a danger of silently ignoring an asynchronous error if we do this this way? Shouldn't we actually be checking the return value of the cudaGetLastError call and reporting unexpected errors in one way or another?

brycelelbach · 2019-10-13T16:26:09Z

No. Asynchronous errors are sticky; they cannot be cleared.

brycelelbach · 2019-10-15T03:59:07Z

This passed internal CI. Accepting.

griwes and others added 2 commits September 12, 2019 22:00

Add mentions of 1.9.6 to the documentation.

621df21

brycelelbach requested a review from griwes October 10, 2019 21:13

brycelelbach merged commit a424837 into master Oct 15, 2019

brycelelbach deleted the bug/nvbug-2720132-clear-global-cuda-error-state branch October 15, 2019 03:59

brycelelbach restored the bug/nvbug-2720132-clear-global-cuda-error-state branch May 16, 2020 06:56

brycelelbach deleted the bug/nvbug-2720132-clear-global-cuda-error-state branch May 16, 2020 07:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear CUDA global error state after CUDA API calls #1020

Clear CUDA global error state after CUDA API calls #1020

brycelelbach commented Oct 10, 2019

griwes commented Oct 13, 2019

brycelelbach commented Oct 13, 2019

brycelelbach commented Oct 15, 2019

Clear CUDA global error state after CUDA API calls #1020

Clear CUDA global error state after CUDA API calls #1020

Conversation

brycelelbach commented Oct 10, 2019

griwes commented Oct 13, 2019

brycelelbach commented Oct 13, 2019

brycelelbach commented Oct 15, 2019