Reduce excessive loop unrolling in lbgpu velocity interpolation #2982

mkuron · 2019-07-09T15:58:17Z

This caused excessive register usage, especially when combined with thrust. Issue discovered by @fweik in #2878.

It turns out that this is a problem for CUDA too, it just exhibits a different behavior. Instead of crashing like on HIP, CUDA just produces a large binary and slower code. In a perfect world, the compiler should display a warning, but I guess neither AMD nor Nvidia operate in a perfect world.

This caused excessive register usage

codecov · 2019-07-09T16:10:39Z

Codecov Report

Merging #2982 into python will not change coverage.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           python   #2982   +/-   ##
======================================
  Coverage      82%     82%           
======================================
  Files         525     525           
  Lines       26807   26807           
======================================
  Hits        22015   22015           
  Misses       4792    4792

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a211031...f6acc47. Read the comment docs.

src/core/grid_based_algorithms/lbgpu_cuda.cu

KaiSzuttor · 2019-07-10T07:54:13Z

bors r+

@fweik

2982: Reduce excessive loop unrolling in lbgpu velocity interpolation r=KaiSzuttor a=mkuron This caused excessive register usage, especially when combined with thrust. Issue discovered by @fweik in #2878. It turns out that this is a problem for CUDA too, it just exhibits a different behavior. Instead of crashing like on HIP, CUDA just produces a large binary and slower code. In a perfect world, the compiler should display a warning, but I guess neither AMD nor Nvidia operate in a perfect world. Co-authored-by: Michael Kuron <mkuron@users.noreply.github.com>

bors · 2019-07-10T09:19:27Z

Build succeeded

ICP GitLab CI

2984: Don't unroll loops with nontrivial loop bodies r=KaiSzuttor a=mkuron Simple assignments and arithmetic are fine to unroll, but if there's a function call or a nested loop, unrolling is probably not useful Follow-up to #2982. Co-authored-by: Michael Kuron <mkuron@icp.uni-stuttgart.de>

2984: Don't unroll loops with nontrivial loop bodies r=KaiSzuttor a=mkuron Simple assignments and arithmetic are fine to unroll, but if there's a function call or a nested loop, unrolling is probably not useful Follow-up to #2982. 2993: Update Doxygen landing page r=RudolfWeeber a=jngrad Fixes #2600, closes #294 Update Doxygen landing page and add a link to the Related Pages where detailed instructions are provided for developing new features in the C++ core. 2998: Benchmarks reliability r=RudolfWeeber a=jngrad Since 4.0 a few things changed in the CMake workflow (either in espresso CMake files, or in newer CMake releases) that broke the benchmark functionality: - changing myconfig.hpp doesn't always cause a full project recompilation (rare) - ctest sometimes runs benchmarks in parallel The benchmark bash scripts were also brittle. They were both refactored based on `build_cmake.sh`: - trap mechanism with clear error message - another trap with cleanup action - simplified CMake logic - ctest runs benchmarks in serial - erase /build/src before each compilation Other changes: - don't print debug energy values in the CSV output - some column names were changed - the [wiki documentation](https://github.com/espressomd/espresso/wiki/Development#benchmarking) was updated accordingly - and it now discusses condor on ICP machines - reduce risk of LJ simulation crashing with a longer energy minimization - add a benchmark for bonded interactions - print more info in stdout (config file being processed, energy during LJ minimization and simulation) Co-authored-by: Michael Kuron <mkuron@icp.uni-stuttgart.de> Co-authored-by: Kai Szuttor <2150555+kaiszuttor@users.noreply.github.com> Co-authored-by: Jean-Noël Grad <jgrad@icp.uni-stuttgart.de>

Reduce excessive loop unrolling in lbgpu velocity interpolation

f6acc47

This caused excessive register usage

mkuron mentioned this pull request Jul 9, 2019

Lbgpu node vel #2878

Merged

KaiSzuttor reviewed Jul 10, 2019

View reviewed changes

src/core/grid_based_algorithms/lbgpu_cuda.cu Show resolved Hide resolved

KaiSzuttor approved these changes Jul 10, 2019

View reviewed changes

bors bot merged commit f6acc47 into espressomd:python Jul 10, 2019

mkuron mentioned this pull request Jul 10, 2019

Don't unroll loops with nontrivial loop bodies #2984

Merged

mkuron mentioned this pull request Jul 23, 2019

HIP issue list as discussed in the offline meeting #2973

Closed

mkuron deleted the patch-14 branch September 13, 2019 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce excessive loop unrolling in lbgpu velocity interpolation #2982

Reduce excessive loop unrolling in lbgpu velocity interpolation #2982

mkuron commented Jul 9, 2019 •

edited

Loading

codecov bot commented Jul 9, 2019 •

edited

Loading

KaiSzuttor commented Jul 10, 2019

bors bot commented Jul 10, 2019

Reduce excessive loop unrolling in lbgpu velocity interpolation #2982

Reduce excessive loop unrolling in lbgpu velocity interpolation #2982

Conversation

mkuron commented Jul 9, 2019 • edited Loading

codecov bot commented Jul 9, 2019 • edited Loading

Codecov Report

KaiSzuttor commented Jul 10, 2019

bors bot commented Jul 10, 2019

Build succeeded

mkuron commented Jul 9, 2019 •

edited

Loading

codecov bot commented Jul 9, 2019 •

edited

Loading