This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reduce consists of two kernels. The first one reads input data and accumulates partial sums in a temporary storage. The second one reads the temporary storage and writes the final result. Therefore, it's safe to alias input and output arrays.
Nonetheless, allowing in-place execution would limit our abilities to optimize algorithm later. Since aliasing doesn't provide significant memory saving, I'd rather not allow it.
Regarding
ByKey
variant, it relies on decoupled look back, so it should be safe to alias in/out data as long as value types for input/output iterators match exactly. The only limiting factor isLOAD_LDG
which makes aliasing in this case an UB. In-place execution in this case would provide significant memory savings. If there's a request, I suggest we add an overload that would allow in-place execution.Regarding
Segmented
version, one block is assigned per segment. Results are written without synchronization between blocks, therefore, any aliasing with output array would introduce a data race.