You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
In reduce-then-scan, we currently implement an optimization for the case of oneapi::dpl::unique / oneapi::dpl::unique_copy to minimize branching due to special handling of the first index. We should see if we can generalize this to support other algorithms as we hook them into the reduce-then-scan path.
Problem Statement:
In the reduce-then-scan implementation contained in the SYCL backend, we have currently implemented an optimization for the case of oneapi::dpl::unique where the callback functor, __gen_unique_mask, reads the assigned index and previous index from memory. For index 0, we must guard against reading from an invalid location outside of the buffer range. The naive solution is to add an if statement in the callback to special handle the case when the index is 0. However, this additional branching has negative performance impacts for all other cases.
To resolve this issue, a special path has been exposed in reduce-then-scan that avoids this additional branching. As we integrate more algorithms into the reduce-then-scan path such as the "by-segment" class of algorithms, we see the need for a similar approach.
Preferred Solution:
The approach used to special handle unique in reduce-then-scan should be generalized to support other algorithms such as reduce_by_segment, inclusive_scan_by_segment, and exclusive_scan_by_segment. One such approach would be to change the unique true/false type to an enum to special handle each case individually. However, other approaches to generalization should also be explored.
Performance impact should be measured before and after these optimizations to quantify how much benefit is brought with these specializations.
The text was updated successfully, but these errors were encountered:
Summary:
In reduce-then-scan, we currently implement an optimization for the case of
oneapi::dpl::unique
/oneapi::dpl::unique_copy
to minimize branching due to special handling of the first index. We should see if we can generalize this to support other algorithms as we hook them into the reduce-then-scan path.Problem Statement:
In the reduce-then-scan implementation contained in the SYCL backend, we have currently implemented an optimization for the case of
oneapi::dpl::unique
where the callback functor, __gen_unique_mask, reads the assigned index and previous index from memory. For index 0, we must guard against reading from an invalid location outside of the buffer range. The naive solution is to add an if statement in the callback to special handle the case when the index is 0. However, this additional branching has negative performance impacts for all other cases.To resolve this issue, a special path has been exposed in reduce-then-scan that avoids this additional branching. As we integrate more algorithms into the reduce-then-scan path such as the "by-segment" class of algorithms, we see the need for a similar approach.
Preferred Solution:
The approach used to special handle unique in reduce-then-scan should be generalized to support other algorithms such as
reduce_by_segment
,inclusive_scan_by_segment
, andexclusive_scan_by_segment
. One such approach would be to change the unique true/false type to an enum to special handle each case individually. However, other approaches to generalization should also be explored.Performance impact should be measured before and after these optimizations to quantify how much benefit is brought with these specializations.
The text was updated successfully, but these errors were encountered: