-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: generate error message and tests for CUDA and CPU kernels #2989
fix: generate error message and tests for CUDA and CPU kernels #2989
Conversation
Codecov ReportAttention:
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I checked out this branch and saw that the generated tests now have checks like
try:
ak_cu.synchronize_cuda()
except ValueError as e:
assert str(e) == "tags[i] < 0 in compiled CUDA code (awkward_UnionArray_validity)"
That does the right thing, although you may want to use pytest.raises, like this:
with pytest.raises(ValueError, match=r"tags[i] < 0 in compiled CUDA code (awkward_UnionArray_validity)"):
ak_cu.synchronize_cuda()
The difference is that the first code block would pass if synchronize does not raise an error or raises an error with the correct string and the second code block would pass only if synchronize raises an error with the correct string.
Some kernels have more than one possible error, and the error message that gets displayed could depend on a race condition between different threads if one thread would hit one error and a different thread would hit a different errors. You might want to consider relaxing the test constraint to only matching the part about which kernel raised the error.
When I ran this on my GPU, all of the unit tests passed, but the integration tests
pytest tests-cuda
failed with
================================================= test session starts =================================================
platform linux -- Python 3.10.13, pytest-7.4.4, pluggy-1.4.0
Matplotlib: 3.8.2
Freetype: 2.12.1
rootdir: /home/jpivarski/irishep/awkward
configfile: pyproject.toml
plugins: forked-1.6.0, xdist-3.5.0, cov-4.1.0, typeguard-4.1.5, mpl-0.16.1, reverse-1.7.0, mock-3.12.0, asyncio-0.23.3, timeout-2.2.0, anyio-4.2.0
asyncio: mode=strict
collected 69 items
tests-cuda/test_1276_cuda_num.py xx....... [ 13%]
tests-cuda/test_1276_cuda_transfers.py ......x.x..x.xxx [ 36%]
tests-cuda/test_1276_cupy_interop.py . [ 37%]
tests-cuda/test_1276_from_cupy.py ..... [ 44%]
tests-cuda/test_1300_same_for_numba_cuda.py ....................... [ 78%]
tests-cuda/test_1381_check_errors.py F [ 79%]
tests-cuda/test_1809_array_cuda_jit.py .............. [100%]
====================================================== FAILURES =======================================================
________________________________________________________ test _________________________________________________________
def test():
v2_array = ak.Array([1, 2, 3, 4, 5]).layout
stream = cp.cuda.Stream(non_blocking=True)
v2_array_cuda = ak.to_backend(v2_array, "cuda")
with cp.cuda.Stream() as stream:
v2_array_cuda[10,]
v2_array_cuda[11,]
v2_array_cuda[12,]
with pytest.raises(ValueError) as err:
awkward._connect.cuda.synchronize_cuda(stream)
assert isinstance(err.value, ValueError)
message = "".join(traceback.format_exception(err.type, err.value, err.tb))
> assert (
"ValueError: index out of range in compiled CUDA code "
"(awkward_RegularArray_getitem_next_at)\n"
"\n"
"This error occurred while attempting to slice\n"
"\n"
" <Array [1, 2, 3, 4, 5] type='5 * int64'>\n"
"\n"
"with\n"
"\n"
" (10)\n"
) in message
E assert "ValueError: index out of range in compiled CUDA code (awkward_RegularArray_getitem_next_at)\n\nThis error occurred while attempting to slice\n\n <Array [1, 2, 3, 4, 5] type='5 * int64'>\n\nwith\n\n (10)\n" in 'Traceback (most recent call last):\n File "/home/jpivarski/irishep/awkward/tests-cuda/test_1381_check_errors.py", li...\n raise ValueError(\nValueError: index out of range in compiled CUDA code (awkward_RegularArray_getitem_next_at)\n'
err = <ExceptionInfo ValueError('index out of range in compiled CUDA code (awkward_RegularArray_getitem_next_at)') tblen=2>
message = 'Traceback (most recent call last):\n File "/home/jpivarski/irishep/awkward/tests-cuda/test_1381_check_errors.py", li...\n raise ValueError(\nValueError: index out of range in compiled CUDA code (awkward_RegularArray_getitem_next_at)\n'
stream = <Stream 101166609746576 (device 0)>
v2_array = <NumpyArray dtype='int64' len='5'>[1 2 3 4 5]</NumpyArray>
v2_array_cuda = <Array [1, 2, 3, 4, 5] type='5 * int64'>
tests-cuda/test_1381_check_errors.py:37: AssertionError
=============================================== short test summary info ===============================================
XFAIL tests-cuda/test_1276_cuda_num.py::test_num_1 - unimplemented CUDA Kernels (awkward_ByteMaskedArray_numnull
XFAIL tests-cuda/test_1276_cuda_num.py::test_num_2 - unimplemented CUDA Kernels (awkward_ByteMaskedArray_numnull
XFAIL tests-cuda/test_1276_cuda_transfers.py::test_tocuda_unimplementedkernels6 - awkward_ListArray_broadcast_tooffsets is not implemented
XFAIL tests-cuda/test_1276_cuda_transfers.py::test_tocuda_unimplementedkernels8 - awkward_ListArray_broadcast_tooffsets is not implemented
XFAIL tests-cuda/test_1276_cuda_transfers.py::test_tocuda_unimplementedkernels11 - awkward_ListArray_broadcast_tooffsets is not implemented
XFAIL tests-cuda/test_1276_cuda_transfers.py::test_tocuda_unimplementedkernels13 - awkward_ListArray_broadcast_tooffsets is not implemented
XFAIL tests-cuda/test_1276_cuda_transfers.py::test_tocuda_unimplementedkernels14 - awkward_ListArray_broadcast_tooffsets is not implemented
XFAIL tests-cuda/test_1276_cuda_transfers.py::test_tocuda_unimplementedkernels15 - awkward_ListArray_broadcast_tooffsets is not implemented
FAILED tests-cuda/test_1381_check_errors.py::test - assert "ValueError: index out of range in compiled CUDA code (awkward_RegularArray_getitem_next_at)\n\nThis error ...
======================================= 1 failed, 60 passed, 8 xfailed in 7.20s =======================================
Do you see the same error?
Yes, this would be better. Thanks!
But in the tests I am checking for that specific error condition only so irrespective of the order in which threads run, it would still give the same error. I have one test for each error condition to make sure that specific condition is working.
I added a few changes later and missed checking the integration tests. I'll run them again and fix the error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested everything with a GPU—all tests pass—and I see that there are pytest.raises
blocks in the generated code:
with pytest.raises(ValueError, match=rf"tags\[i\]\ <\ 0\ in\ compiled\ CUDA\ code\ \(awkward_UnionArray_validity\)"):
ak_cu.synchronize_cuda()
(although it might be a little nicer to have the re.escape
call included in the generated code, rather than generating escaped strings, since it would be more self-documenting).
I think it's ready to go! I'll update to main
and run all tests again (CPU on GitHub and GPU on my computer), and then you can merge it when you're sure you're done.
The changes from Edit: All GPU tests passed. |
I have added the |
Generate error message and add tests for checking if the correct error message is generated or not.
======= 1625 passed in 1.32s =====
======= 5479 passed in 2.62s =====
======= 2042 passed, 1736 skipped in 3.97s =====