Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gpu policies that do not check loop bounds #1778

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

MrBurmark
Copy link
Member

@MrBurmark MrBurmark commented Dec 10, 2024

GPU Policies that don't check loop bounds

This is useful when using the block indices as the bounds are used when launching the kernel and do not need to be checked in the kernel.
This helps performance up to about 5% in some simple kernels and gets us to performance parity with the base variants in some RAJAPerf Kernels.
Regarding naming, I changed this to "direct_unchecked" from "unchecked" hopefully that is easier to understand.

These policies may not be safe to use with kernel, see #1733.

  • This PR is a feature
  • It does the following:
    • Adds unchecked policies at the request of myself

Design review (for API changes or additions---delete if unneeded)

On (date), we reviewed this PR. We discussed the design ideas:

  1. First idea or goal
  2. Second idea
  3. Third idea

This PR implements 1. and 3. It leaves out 2. for the following reasons

  • (impractical)
  • (too big)
  • (not a good idea anyway)

@MrBurmark MrBurmark requested review from artv3, rhornung67 and a team December 10, 2024 17:37
Copy link
Member

@artv3 artv3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should cover sycl too?

@artv3
Copy link
Member

artv3 commented Dec 16, 2024

I think this is an important PR to get in, I've talked to a few users who could benefit from this capability.

Use a macro to generate the various aliases
This iteration mapping assumes that the number of
iterations is the same as the size of the range
and does no checking.
This is useful when mapping gpu blocks as we often
launch the exact number we need and don't need to check
if we are in range. This can give ~5% speedup vs direct
in this case.
There were a number missing for cuda/hip
This adds testing for unchecked policies with cuda and hip
in kernel and launch.
@MrBurmark MrBurmark force-pushed the feature/burmark1/unchecked_policies branch from 8da4979 to 8680495 Compare December 30, 2024 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants