Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: generate error message and tests for CUDA and CPU kernels #2989

Merged
merged 8 commits into from
Jan 30, 2024

Conversation

ManasviGoyal
Copy link
Collaborator

@ManasviGoyal ManasviGoyal commented Jan 26, 2024

Generate error message and add tests for checking if the correct error message is generated or not.

  • tests-spec-explicit
    ======= 1625 passed in 1.32s =====
  • tests-cpu-kernels-explicit
    ======= 5479 passed in 2.62s =====
  • tests-cuda-kernels-explicit
    ======= 2042 passed, 1736 skipped in 3.97s =====

@ManasviGoyal ManasviGoyal added the gpu Concerns the GPU implementation (backend = "cuda') label Jan 26, 2024
@ManasviGoyal ManasviGoyal changed the title fix: generate error message and tests fix: generate error message and tests for CUDA kernels Jan 26, 2024
Copy link

codecov bot commented Jan 26, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (9191df2) 81.88% compared to head (a97eb6c) 81.87%.

Additional details and impacted files
Files Coverage Δ
src/awkward/_connect/cuda/__init__.py 0.00% <0.00%> (ø)

@ManasviGoyal ManasviGoyal changed the title fix: generate error message and tests for CUDA kernels fix: generate error message and tests for CUDA and CPU kernels Jan 26, 2024
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I checked out this branch and saw that the generated tests now have checks like

    try:
        ak_cu.synchronize_cuda()

    except ValueError as e:
        assert str(e) == "tags[i] < 0 in compiled CUDA code (awkward_UnionArray_validity)"

That does the right thing, although you may want to use pytest.raises, like this:

    with pytest.raises(ValueError, match=r"tags[i] < 0 in compiled CUDA code (awkward_UnionArray_validity)"):
        ak_cu.synchronize_cuda()

The difference is that the first code block would pass if synchronize does not raise an error or raises an error with the correct string and the second code block would pass only if synchronize raises an error with the correct string.

Some kernels have more than one possible error, and the error message that gets displayed could depend on a race condition between different threads if one thread would hit one error and a different thread would hit a different errors. You might want to consider relaxing the test constraint to only matching the part about which kernel raised the error.

When I ran this on my GPU, all of the unit tests passed, but the integration tests

pytest tests-cuda

failed with

================================================= test session starts =================================================
platform linux -- Python 3.10.13, pytest-7.4.4, pluggy-1.4.0
Matplotlib: 3.8.2
Freetype: 2.12.1
rootdir: /home/jpivarski/irishep/awkward
configfile: pyproject.toml
plugins: forked-1.6.0, xdist-3.5.0, cov-4.1.0, typeguard-4.1.5, mpl-0.16.1, reverse-1.7.0, mock-3.12.0, asyncio-0.23.3, timeout-2.2.0, anyio-4.2.0
asyncio: mode=strict
collected 69 items                                                                                                    

tests-cuda/test_1276_cuda_num.py xx.......                                                                      [ 13%]
tests-cuda/test_1276_cuda_transfers.py ......x.x..x.xxx                                                         [ 36%]
tests-cuda/test_1276_cupy_interop.py .                                                                          [ 37%]
tests-cuda/test_1276_from_cupy.py .....                                                                         [ 44%]
tests-cuda/test_1300_same_for_numba_cuda.py .......................                                             [ 78%]
tests-cuda/test_1381_check_errors.py F                                                                          [ 79%]
tests-cuda/test_1809_array_cuda_jit.py ..............                                                           [100%]

====================================================== FAILURES =======================================================
________________________________________________________ test _________________________________________________________

    def test():
        v2_array = ak.Array([1, 2, 3, 4, 5]).layout
    
        stream = cp.cuda.Stream(non_blocking=True)
    
        v2_array_cuda = ak.to_backend(v2_array, "cuda")
        with cp.cuda.Stream() as stream:
            v2_array_cuda[10,]
            v2_array_cuda[11,]
            v2_array_cuda[12,]
    
        with pytest.raises(ValueError) as err:
            awkward._connect.cuda.synchronize_cuda(stream)
    
        assert isinstance(err.value, ValueError)
    
        message = "".join(traceback.format_exception(err.type, err.value, err.tb))
>       assert (
            "ValueError: index out of range in compiled CUDA code "
            "(awkward_RegularArray_getitem_next_at)\n"
            "\n"
            "This error occurred while attempting to slice\n"
            "\n"
            "    <Array [1, 2, 3, 4, 5] type='5 * int64'>\n"
            "\n"
            "with\n"
            "\n"
            "    (10)\n"
        ) in message
E       assert "ValueError: index out of range in compiled CUDA code (awkward_RegularArray_getitem_next_at)\n\nThis error occurred while attempting to slice\n\n    <Array [1, 2, 3, 4, 5] type='5 * int64'>\n\nwith\n\n    (10)\n" in 'Traceback (most recent call last):\n  File "/home/jpivarski/irishep/awkward/tests-cuda/test_1381_check_errors.py", li...\n    raise ValueError(\nValueError: index out of range in compiled CUDA code (awkward_RegularArray_getitem_next_at)\n'

err        = <ExceptionInfo ValueError('index out of range in compiled CUDA code (awkward_RegularArray_getitem_next_at)') tblen=2>
message    = 'Traceback (most recent call last):\n  File "/home/jpivarski/irishep/awkward/tests-cuda/test_1381_check_errors.py", li...\n    raise ValueError(\nValueError: index out of range in compiled CUDA code (awkward_RegularArray_getitem_next_at)\n'
stream     = <Stream 101166609746576 (device 0)>
v2_array   = <NumpyArray dtype='int64' len='5'>[1 2 3 4 5]</NumpyArray>
v2_array_cuda = <Array [1, 2, 3, 4, 5] type='5 * int64'>

tests-cuda/test_1381_check_errors.py:37: AssertionError
=============================================== short test summary info ===============================================
XFAIL tests-cuda/test_1276_cuda_num.py::test_num_1 - unimplemented CUDA Kernels (awkward_ByteMaskedArray_numnull
XFAIL tests-cuda/test_1276_cuda_num.py::test_num_2 - unimplemented CUDA Kernels (awkward_ByteMaskedArray_numnull
XFAIL tests-cuda/test_1276_cuda_transfers.py::test_tocuda_unimplementedkernels6 - awkward_ListArray_broadcast_tooffsets is not implemented
XFAIL tests-cuda/test_1276_cuda_transfers.py::test_tocuda_unimplementedkernels8 - awkward_ListArray_broadcast_tooffsets is not implemented
XFAIL tests-cuda/test_1276_cuda_transfers.py::test_tocuda_unimplementedkernels11 - awkward_ListArray_broadcast_tooffsets is not implemented
XFAIL tests-cuda/test_1276_cuda_transfers.py::test_tocuda_unimplementedkernels13 - awkward_ListArray_broadcast_tooffsets is not implemented
XFAIL tests-cuda/test_1276_cuda_transfers.py::test_tocuda_unimplementedkernels14 - awkward_ListArray_broadcast_tooffsets is not implemented
XFAIL tests-cuda/test_1276_cuda_transfers.py::test_tocuda_unimplementedkernels15 - awkward_ListArray_broadcast_tooffsets is not implemented
FAILED tests-cuda/test_1381_check_errors.py::test - assert "ValueError: index out of range in compiled CUDA code (awkward_RegularArray_getitem_next_at)\n\nThis error ...
======================================= 1 failed, 60 passed, 8 xfailed in 7.20s =======================================

Do you see the same error?

@ManasviGoyal
Copy link
Collaborator Author

ManasviGoyal commented Jan 26, 2024

That does the right thing, although you may want to use pytest.raises, like this:

    with pytest.raises(ValueError, match=r"tags[i] < 0 in compiled CUDA code (awkward_UnionArray_validity)"):
        ak_cu.synchronize_cuda()

Yes, this would be better. Thanks!

Some kernels have more than one possible error, and the error message that gets displayed could depend on a race condition between different threads if one thread would hit one error and a different thread would hit a different errors. You might want to consider relaxing the test constraint to only matching the part about which kernel raised the error.

But in the tests I am checking for that specific error condition only so irrespective of the order in which threads run, it would still give the same error. I have one test for each error condition to make sure that specific condition is working.

Do you see the same error?

I added a few changes later and missed checking the integration tests. I'll run them again and fix the error.

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested everything with a GPU—all tests pass—and I see that there are pytest.raises blocks in the generated code:

with pytest.raises(ValueError, match=rf"tags\[i\]\ <\ 0\ in\ compiled\ CUDA\ code\ \(awkward_UnionArray_validity\)"):
    ak_cu.synchronize_cuda()

(although it might be a little nicer to have the re.escape call included in the generated code, rather than generating escaped strings, since it would be more self-documenting).

I think it's ready to go! I'll update to main and run all tests again (CPU on GitHub and GPU on my computer), and then you can merge it when you're sure you're done.

@jpivarski
Copy link
Member

jpivarski commented Jan 29, 2024

The changes from main were only LayoutBuilder, which couldn't affect this. Oh well, I'm running all tests again just to be formally correct.

Edit: All GPU tests passed.

@ManasviGoyal
Copy link
Collaborator Author

I've tested everything with a GPU—all tests pass—and I see that there are pytest.raises blocks in the generated code:

with pytest.raises(ValueError, match=rf"tags\[i\]\ <\ 0\ in\ compiled\ CUDA\ code\ \(awkward_UnionArray_validity\)"):
    ak_cu.synchronize_cuda()

(although it might be a little nicer to have the re.escape call included in the generated code, rather than generating escaped strings, since it would be more self-documenting).

I think it's ready to go! I'll update to main and run all tests again (CPU on GitHub and GPU on my computer), and then you can merge it when you're sure you're done.

I have added the re.escape() call in the generated code. Thanks!

@ManasviGoyal ManasviGoyal merged commit f2a2340 into main Jan 30, 2024
39 checks passed
@ManasviGoyal ManasviGoyal deleted the ManasviGoyal/generate-error-message-and-tests branch January 30, 2024 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu Concerns the GPU implementation (backend = "cuda')
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants