Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize reduce-then-scan special handling of first element in the SYCL backend #1958

Open
mmichel11 opened this issue Dec 6, 2024 · 0 comments

Comments

@mmichel11
Copy link
Contributor

Summary:
In reduce-then-scan, we currently implement an optimization for the case of oneapi::dpl::unique / oneapi::dpl::unique_copy to minimize branching due to special handling of the first index. We should see if we can generalize this to support other algorithms as we hook them into the reduce-then-scan path.

Problem Statement:
In the reduce-then-scan implementation contained in the SYCL backend, we have currently implemented an optimization for the case of oneapi::dpl::unique where the callback functor, __gen_unique_mask, reads the assigned index and previous index from memory. For index 0, we must guard against reading from an invalid location outside of the buffer range. The naive solution is to add an if statement in the callback to special handle the case when the index is 0. However, this additional branching has negative performance impacts for all other cases.

To resolve this issue, a special path has been exposed in reduce-then-scan that avoids this additional branching. As we integrate more algorithms into the reduce-then-scan path such as the "by-segment" class of algorithms, we see the need for a similar approach.

Preferred Solution:
The approach used to special handle unique in reduce-then-scan should be generalized to support other algorithms such as reduce_by_segment, inclusive_scan_by_segment, and exclusive_scan_by_segment. One such approach would be to change the unique true/false type to an enum to special handle each case individually. However, other approaches to generalization should also be explored.

Performance impact should be measured before and after these optimizations to quantify how much benefit is brought with these specializations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant