-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify filterParticles Kernel #3510
Simplify filterParticles Kernel #3510
Conversation
On Summit, generation of this kernel shows compiler issues with `nvcc` 11.3.109, which lead to segmentation faults at runtime. This simplifies the kernel generation, which also solves the issue.
Co-authored-by: AlexanderSinn <64009254+AlexanderSinn@users.noreply.github.com>
For CUDA, The limitation is CUDA only. HIP and SYCL work. |
@jmsexton03 asked on slack
@WeiqunZhang was your comment in response to that or did I miss a connection? |
I tried the latest version of this PR with the explicit capture by @AlexanderSinn . |
I didn't know @jmsexton03's comment on slack. My comment was trying to explain why the first version of this draft PR does not work. It's because
|
01d63d2
to
010b365
Compare
THIS FIXES IT!!! (lol) |
|
010b365
to
e2ad16f
Compare
no capture: |
The |
Summary
On Summit, generation of this kernel shows compiler issues with
nvcc
11.3.109: it compiles without warnings but leads to a segmentation fault at runtime.The fix for the compiler bug is to implement the trivial lambda that is passed to
copyParticles
w/o capture:[]
.We also add explicit capture to a few other lambdas, to simplify compiler intake complexity.
First commit: wow, this reliably triggereds the compiler bug (invalid device function) in step 1.(see reason below)This simplifies the kernel generation, which also solves the issue seen with WarpX for this line.
Additional background
Checklist
The proposed changes: