Distributing data

Situations where giving no control over distribution of data poses performance concerns. List solutions on top of Grappa and/or solutions within Grappa memory allocation.

Computation on an array is mostly data parallel but involves consecutive elements.

A forall_local approach will preserve ordering but not consecutive elements on one node.
- Could cache adjacent elements in computation--they are not guaranteed to be local, but in most cases would be. This is similar to the future story for forall_local where we'll cache items so that if an object spans more than one block, it'll still work out.
e.g., prefix sum

Computation on 2+ arrays is data parallel but involves element-wise operations.

The arrays may be distributed differently, disabling straightforward forall_local approach.
Within-Grappa solutions
- Provide construct to malloc an array of same type T with the same start node as another array. For allocator simplicity this might be done in a way that wastes space.
Atop-Grappa solutions
- Allocate the second array to be large enough so that the effective start pointer can be to the same node as the first array.
- Struct-of-arrays to array-of-structs transformation, to force the two arrays to be distributed alike.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributing data

Computation on an array is mostly data parallel but involves consecutive elements.

Computation on 2+ arrays is data parallel but involves element-wise operations.

Clone this wiki locally