-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: turn on CUDA unit tests for working kernels and add some CUDA kernels #2930
feat: turn on CUDA unit tests for working kernels and add some CUDA kernels #2930
Conversation
35d13e6
to
effcef9
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good start. I just checked it out and tested
python dev/generate-tests.py
pytest tests-cuda-kernels/test_cudaawkward_ListArray64_getitem_next_array_advanced_64.py
which failed. (Is it supposed to be working yet?)
src/awkward/_connect/cuda/cuda_kernels/awkward_ListArray_getitem_next_array_advanced.cu
Show resolved
Hide resolved
It does work on my system. Can you share the error? |
I did a fresh reinstallation, completely within this branch, and it does work. There must be something left over if you use the same git repo between branches, but that's okay. (The end state will be stable.) % python dev/generate-tests.py
% pytest tests-cuda-kernels ================================================= test session starts ================================================= platform linux -- Python 3.10.13, pytest-7.4.4, pluggy-1.3.0 Matplotlib: 3.8.2 Freetype: 2.12.1 rootdir: /home/jpivarski/irishep/awkward configfile: pyproject.toml plugins: forked-1.6.0, xdist-3.5.0, cov-4.1.0, typeguard-4.1.5, mpl-0.16.1, reverse-1.7.0, mock-3.12.0, asyncio-0.23.3, timeout-2.2.0, anyio-4.2.0 asyncio: mode=strict collected 6178 items |
effcef9
to
7098c86
Compare
18c101e
to
94593a1
Compare
@jpivarski For the nested loop, I was taking about cases like awkward_ListArray_getitem_next_range_spreadadvanced and awkward_ListArray_broadcast_tooffsets. Since, the inner loop will vary with each iteration of outer loop we can't use something like this to use threads for both loops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good! Let's merge it!
I ran all tests locally (to make up for the fact that CI doesn't run the CUDA tests; at least we have two different machines testing it).
@jpivarski For the nested loop, I was taking about cases like awkward_ListArray_getitem_next_range_spreadadvanced and awkward_ListArray_broadcast_tooffsets. Since, the inner loop will vary with each iteration of outer loop we can't use something like this to use threads for both loops.
That's true, and the RegularArray kernels are a good example of cases that can be done in a straightforward way in CUDA. The straightforward implementation of the ListArray kernels' nested loops is non-optimal, but if it comes to it, you can temporarily implement them that way to ensure that all kernels exist before making them optimal.
I made issue #2987 so that you're free to write non-optimal kernels and get full coverage before optimizing them. (So that we don't have to worry about forgetting to go back and make them better.) If you can't think of a way to do the ListArray kernels well, make sure they work and add them to the list. You'll probably find that the list has only a few different "reasons for not being optimal," and recognize that you only have a few algorithmic problems to solve.
But on this PR right now, when you think it's ready, update to main and squash-and-merge when the tests pass again.
94593a1
to
52eeefb
Compare
Added the following kernels and made some minor fixes.
tests-cuda-kernels-explicit
=========== 1981 passed, 1721 skipped in 6.09s ===========
tests-cpu-kernels-explicit
=========== 5403 passed in 2.49s ===========
Also some minor fixes in the parsed python code for some kernels.