Skip to content

Commit

Permalink
Merge #2982
Browse files Browse the repository at this point in the history
2982: Reduce excessive loop unrolling in lbgpu velocity interpolation r=KaiSzuttor a=mkuron

This caused excessive register usage, especially when combined with thrust. Issue discovered by @fweik in #2878.

It turns out that this is a problem for CUDA too, it just exhibits a different behavior. Instead of crashing like on HIP, CUDA just produces a large binary and slower code. In a perfect world, the compiler should display a warning, but I guess neither AMD nor Nvidia operate in a perfect world.

Co-authored-by: Michael Kuron <mkuron@users.noreply.github.com>
  • Loading branch information
bors[bot] and mkuron committed Jul 10, 2019
2 parents d404075 + f6acc47 commit 326c261
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion src/core/grid_based_algorithms/lbgpu_cuda.cu
Original file line number Diff line number Diff line change
Expand Up @@ -1418,9 +1418,11 @@ velocity_interpolation(LB_nodes_gpu n_a, float *particle_position,

int cnt = 0;
float3 interpolated_u{0.0f, 0.0f, 0.0f};
#pragma unroll
#pragma unroll 1
for (int i = 0; i < 3; ++i) {
#pragma unroll 1
for (int j = 0; j < 3; ++j) {
#pragma unroll 3
for (int k = 0; k < 3; ++k) {
auto const x =
fold_if_necessary(center_node_index[0] - 1 + i, para->dim_x);
Expand Down

0 comments on commit 326c261

Please sign in to comment.