Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify filterParticles Kernel #3510

Merged
merged 5 commits into from
Aug 26, 2023

Conversation

ax3l
Copy link
Member

@ax3l ax3l commented Aug 23, 2023

Summary

On Summit, generation of this kernel shows compiler issues with nvcc 11.3.109: it compiles without warnings but leads to a segmentation fault at runtime.

The fix for the compiler bug is to implement the trivial lambda that is passed to copyParticles w/o capture: [].

We also add explicit capture to a few other lambdas, to simplify compiler intake complexity.

First commit: wow, this reliably triggereds the compiler bug (invalid device function) in step 1. (see reason below)

This simplifies the kernel generation, which also solves the issue seen with WarpX for this line.

Additional background

Checklist

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • include documentation in the code and/or rst files, if appropriate

On Summit, generation of this kernel shows compiler issues
with `nvcc` 11.3.109, which lead to segmentation faults at runtime.

This simplifies the kernel generation, which also solves the issue.
Co-authored-by: AlexanderSinn <64009254+AlexanderSinn@users.noreply.github.com>
@WeiqunZhang
Copy link
Member

For CUDA, amrex::IsCallable and std::is_invocable work in device code. They also work in host code if the type is a functor (even if its operator() is device only) or a host and device lambda (i.e., AMREX_GPU_HOST_DEVICE). Unfortunately, they do not work in host code on device lambda (i.e., AMREX_GPU_DEVICE). A device lambda has a type of __nv_dl_wrapper_t on the host. It seems that its operator() can take any types such that amrex::IsCallable and std::is_invocable are always true on them. cuda::std::is_invocable has the same limitation.

The limitation is CUDA only. HIP and SYCL work.

@ax3l
Copy link
Member Author

ax3l commented Aug 25, 2023

@jmsexton03 asked on slack

[for amrex::ParallelForRNG] is there a reason to use AMREX_GPU_DEVICE rather than AMREX_GPU_HOST_DEVICE? in terms of consitency, is one of these a better pattern?

@WeiqunZhang was your comment in response to that or did I miss a connection?

@ax3l
Copy link
Member Author

ax3l commented Aug 25, 2023

I tried the latest version of this PR with the explicit capture by @AlexanderSinn .
Unfortunately this still fails on Summit with CUDA 11.3

@WeiqunZhang
Copy link
Member

I didn't know @jmsexton03's comment on slack. My comment was trying to explain why the first version of this draft PR does not work. It's because if constexpr is always true in that case.

First commit: wow, this now reliably triggers the compiler bug (invalid device function) in step 1.

@ax3l ax3l force-pushed the fix-simplify-filterParticles branch from 01d63d2 to 010b365 Compare August 25, 2023 23:11
@ax3l
Copy link
Member Author

ax3l commented Aug 25, 2023

THIS FIXES IT!!! (lol)

@ax3l ax3l changed the title [WIP] Simplify filterParticles Kernel Simplify filterParticles Kernel Aug 25, 2023
@ax3l ax3l marked this pull request as ready for review August 25, 2023 23:50
@WeiqunZhang
Copy link
Member

[] or AMREX_GPU_DEVICE?

@ax3l ax3l force-pushed the fix-simplify-filterParticles branch from 010b365 to e2ad16f Compare August 25, 2023 23:51
@ax3l
Copy link
Member Author

ax3l commented Aug 25, 2023

no capture: []

@ax3l
Copy link
Member Author

ax3l commented Aug 26, 2023

The bug label here stands for: "produces a bug with a third-party compiler" not "is an actual bug in our code".

@ax3l ax3l enabled auto-merge (squash) August 26, 2023 00:28
@ax3l ax3l merged commit d4761e8 into AMReX-Codes:development Aug 26, 2023
66 checks passed
@ax3l ax3l deleted the fix-simplify-filterParticles branch August 27, 2023 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants