Support extents with arbitrary value types #488
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, LLAMA uses
size_t
in all index calculations as is the default in the STL. We knew already from previous experience in picongpu that this increases register usage for CUDA sincesize_t
is 64bits in CUDA and CUDA registers are 32bits in size, so each index variable uses 2 registers. Registers are a scarce resource so it would be beneficial to have control over the type used in index calculations. This PR allows this.We saw similar development in
std::mdspan
(https://wg21.link/P0009) with the proposal “Make mdspan size_type controllable” (https://wg21.link/P2553). The gist was adding an additional template parameter tostd::extents
which gives the type used in all index calculations:For LLAMA, the main change is that
llama::ArrayExtents
gains an additional template parameter defining the type to use for storing extents and performing index calculations in:This change is propagated downstream into
llama::ArrayExtentsDynamic<IndexType, 3>
, which is a breaking change.llama::NrAndOffset
is now templated on the index type as well. FurthermoreArrayExtentsStatic
is renamed toArrayExtentsNCube
, since this is a more accurate description of what it does.