You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When compiling CUDA code with the nvcc compiler and the -G flag (generate device debug symbols), the behavior of the Barnes-Hut algorithm changes. The octree permutation vector is wrong, which causes variables n and i to be assigned incorrect values. This affects the distance calculation in the following loop:
dr[l] = -bhpara->r[3 * n + l] + bhpara->r[3 * i + l];
tmp += dr[l] * dr[l];
}
In the end, the computed forces, torques and energies are wrong. The deviation from the correct value is random, and the forces on individual particles can differ by an order of magnitude. The total energy can be quite close to the real value, and a small fraction of the particles will have the correct forces (up to machine precision), so this bug is easy to overlook when tracking the total energy or the force of a lucky particle.
Something might be fundamentally broken here, probably undefined behavior is invoked. Running all tools from the NVIDIA Compute Sanitizer suite (memcheck, racecheck, initcheck, synccheck) didn't return any error. The bug isn't reproducible with the nvcc -g flag (generate host debug symbols), nor with the clang --cuda-noopt-device-debug flag (generate device debug info).
When compiling CUDA code with the nvcc compiler and the
-G
flag (generate device debug symbols), the behavior of the Barnes-Hut algorithm changes. The octree permutation vector is wrong, which causes variablesn
andi
to be assigned incorrect values. This affects the distance calculation in the following loop:espresso/src/core/magnetostatics/barnes_hut_gpu_cuda.cu
Lines 790 to 794 in a770f49
In the end, the computed forces, torques and energies are wrong. The deviation from the correct value is random, and the forces on individual particles can differ by an order of magnitude. The total energy can be quite close to the real value, and a small fraction of the particles will have the correct forces (up to machine precision), so this bug is easy to overlook when tracking the total energy or the force of a lucky particle.
Something might be fundamentally broken here, probably undefined behavior is invoked. Running all tools from the NVIDIA Compute Sanitizer suite (
memcheck
,racecheck
,initcheck
,synccheck
) didn't return any error. The bug isn't reproducible with the nvcc-g
flag (generate host debug symbols), nor with the clang--cuda-noopt-device-debug
flag (generate device debug info).Here is a MWE adapted from
dawaanr-and-bh-gpu.py
:ESPResSo was compiled with maxset and these CMake options:
CC=gcc-10 CXX=g++-10 CUDACXX=/usr/local/cuda-11.5/bin/nvcc /usr/bin/cmake .. \ -D ESPRESSO_BUILD_WITH_CUDA=ON -D CUDAToolkit_ROOT=/usr/local/cuda-11.5 \ -D ESPRESSO_BUILD_WITH_CCACHE=ON -D ESPRESSO_BUILD_WITH_STOKESIAN_DYNAMICS=ON -D ESPRESSO_BUILD_WITH_WALBERLA=ON \ -D ESPRESSO_BUILD_WITH_WALBERLA_FFT=ON -D ESPRESSO_BUILD_WITH_WALBERLA_AVX=ON \ -D ESPRESSO_BUILD_WITH_SCAFACOS=OFF -D ESPRESSO_BUILD_WITH_HDF5=OFF -D ESPRESSO_BUILD_WITH_GSL=ON \ -D CMAKE_CUDA_FLAGS="--compiler-bindir=/usr/bin/g++-10" -D CMAKE_BUILD_TYPE=Debug
The text was updated successfully, but these errors were encountered: