Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
2982: Reduce excessive loop unrolling in lbgpu velocity interpolation r=KaiSzuttor a=mkuron This caused excessive register usage, especially when combined with thrust. Issue discovered by @fweik in #2878. It turns out that this is a problem for CUDA too, it just exhibits a different behavior. Instead of crashing like on HIP, CUDA just produces a large binary and slower code. In a perfect world, the compiler should display a warning, but I guess neither AMD nor Nvidia operate in a perfect world. Co-authored-by: Michael Kuron <mkuron@users.noreply.github.com>
- Loading branch information