Skip to content

Distributing data

Brandon Holt edited this page Nov 27, 2012 · 3 revisions

Situations where giving no control over distribution of data poses performance concerns. List solutions on top of Grappa and/or solutions within Grappa memory allocation.

Computation on an array is mostly data parallel but involves consecutive elements.

  • A forall_local approach will preserve ordering but not consecutive elements on one node.
    • Could cache adjacent elements in computation--they are not guaranteed to be local, but in most cases would be. This is similar to the future story for forall_local where we'll cache items so that if an object spans more than one block, it'll still work out.
  • e.g., prefix sum

Computation on 2+ arrays is data parallel but involves element-wise operations.

  • The arrays may be distributed differently, disabling straightforward forall_local approach.
  • Within-Grappa solutions
    • Provide construct to malloc an array of same type T with the same start node as another array. For allocator simplicity this might be done in a way that wastes space.
  • Atop-Grappa solutions
    • Allocate the second array to be large enough so that the effective start pointer can be to the same node as the first array.
    • Struct-of-arrays to array-of-structs transformation, to force the two arrays to be distributed alike.