Using GCC and Clang with CUDA 11 on Ubuntu 22.04 #4630

jngrad · 2022-12-14T18:37:26Z

Since the beginning of December 2022, it is no longer possible to build a CUDA-enabled ESPResSo project on Ubuntu 22.04, either via NVCC or Clang. Below we share our experience and the solution that worked for us.

Problem statement

Error 1: NVCC cannot compile a simple CUDA file

CMake Error at /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:726 (message):
  Compiling the CUDA compiler identification source file
  "CMakeCUDACompilerId.cu" failed.

  Compiler: /usr/local/cuda-11.2/bin/nvcc

    139 | #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag 
        | '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported
        | host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.

Error 2: Clang cannot find C++ standard headers like cmath or iostream

/usr/lib/llvm-14/lib/clang/14.0.0/include/__clang_cuda_runtime_wrapper.h:41:10: fatal error: 'cmath' file not found
#include <cmath>
         ^~~~~~~
1 error generated when compiling for sm_61

Error 3: CMake fails to detect MPI libraries when Clang is the compiler.

-- Found MPI_C: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so (found suitable version "3.1") 
-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS) (Required is at least version "3.0")
CMake Error at /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_CXX_FOUND) (found suitable version "3.1",
  minimum required is "3.0")
Call Stack (most recent call first):
  /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake-3.22/Modules/FindMPI.cmake:1830 (find_package_handle_standard_args)
  CMakeLists.txt:323 (find_package)

Error 4: The nvidia-cuda-toolkit cannot be installed together with nvidia-driver-515.

$ sudo apt install nvidia-cuda-toolkit
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages will be REMOVED:
  libnvidia-compute-515 libnvidia-compute-515:i386 libnvidia-decode-515 libnvidia-decode-515:i386
  libnvidia-encode-515 libnvidia-encode-515:i386 nvidia-compute-utils-515 nvidia-driver-515 nvidia-utils-515
The following NEW packages will be installed:
  libnvidia-compute-495 libnvidia-compute-510 nvidia-cuda-dev nvidia-cuda-gdb nvidia-cuda-toolkit [...]
Do you want to continue? [Y/n] n

$ sudo apt install nvidia-cuda-toolkit nvidia-driver-515
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
nvidia-driver-515 is already the newest version (515.86.01-0ubuntu0.22.04.1).
Some packages could not be installed. This may mean that you have requested an impossible situation
or if you are using the unstable distribution that some required packages have not yet been created
or been moved out of Incoming. The following information may help to resolve the situation:
The following packages have unmet dependencies:
 libnvidia-decode-515 : Depends: libnvidia-compute-515 (= 515.86.01-0ubuntu0.22.04.1) but it is not installable
 nvidia-compute-utils-515 : Depends: libnvidia-compute-515 but it is not installable
 nvidia-driver-515 : Depends: libnvidia-compute-515 (= 515.86.01-0ubuntu0.22.04.1) but it is not installable
                     Recommends: libnvidia-compute-515:i386 (= 515.86.01-0ubuntu0.22.04.1)
                     Recommends: libnvidia-decode-515:i386 (= 515.86.01-0ubuntu0.22.04.1)
                     Recommends: libnvidia-encode-515:i386 (= 515.86.01-0ubuntu0.22.04.1)
 nvidia-utils-515 : Depends: libnvidia-compute-515 but it is not installable
E: Unable to correct problems, you have held broken packages.

Things to know before attempting to solve these issues

CUDA 11 requires GCC <= 10.
When Clang 14 compiles CUDA sources, it automatically selects the most recent GCC version, i.e. GCC 12 on Ubuntu 22.04.
GCC 12 cannot be removed: on Ubuntu 22.04, the apt package manager must satisfy the following chain of hard dependencies: nvidia-driver-515 -> nvidia-dkms-515 -> dkms -> gcc-12. Even though the dkms page indicates a dependency on any gcc or c-compiler version, only gcc-12 satisfies that dependency from the point of view of apt (as of December 13, 2022), in spite of the fact that nvidia-driver-515 is in reality compatible with nvidia-cuda-toolkit.
Inside an Ubuntu 22.04 Docker container with nvidia-cuda-toolkit installed and no NVIDIA driver installed, as long as the container was started with docker run --runtime=nvidia, the NVIDIA driver of the host will be used and the NVIDIA toolkit will work as expected (no issues with CMake, NVCC, GCC or Clang).
Posts on StackOverflow suggest to create a new folder containing symlinks to GCC 10 include and lib directories and pass that folder via --gcc-toolchain=my_folder to restrict GCC version detection to GCC 10, unfortunately the C++ headers files are actually split in two separate directories, so that workaround no longer works.

How we solved these issues

To use Clang 14, it is necessary to install a complete GCC 10 toolchain.

do apt install gcc-10 g++-10 libstdc++-10-dev
pass extra flags to make Clang traverse the GCC 10 header files before traversing the GCC 12 header files:

CC=clang-14 CXX=clang++-14 CUDACXX=clang++-14 cmake .. \
  -D CMAKE_CXX_FLAGS="-I/usr/include/x86_64-linux-gnu/c++/10 -I/usr/include/c++/10" \
  -D CMAKE_CUDA_FLAGS="-I/usr/include/x86_64-linux-gnu/c++/10 -I/usr/include/c++/10"

To use the toolkit with the 515 driver:
- remove nvidia-cuda-toolkit
- install nvidia-driver-515
- manually install a CUDA 11.x toolkit in a folder like /usr/local/cuda-11.5
  - download the desired runfile, even if it's only officially supported on Ubuntu 20.04, e.g. cuda 11.5.2
  - run sh cuda_11.5.2_495.29.05_linux.run as a regular user
  - tick Continue and accept the EULA to go to CUDA Installer
  - untick everything
  - tick CUDA Toolkit 11.5
  - open Options
  - open Toolkit Options
  - untick all
  - open Change Toolkit Install Path
  - type /usr/local/cuda-11.5/
  - go back three levels with Done
  - hit Install
- adapt the compiler commands via CMake options:
```
# case 1: GCC + NVCC
CC=gcc-12 CXX=g++-12 CUDACXX=/usr/local/cuda-11.5/bin/nvcc cmake .. \
  -D CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.5 \
  -D CMAKE_CUDA_FLAGS="--compiler-bindir=/usr/bin/g++-10"

# case 2: Clang only
CC=clang-14 CXX=clang++-14 CUDACXX=clang++-14 cmake .. \
  -D CMAKE_CXX_FLAGS="-I/usr/include/x86_64-linux-gnu/c++/10/ -I/usr/include/c++/10 --cuda-path=/usr/local/cuda-11.5" \
  -D CMAKE_CUDA_FLAGS="-I/usr/include/x86_64-linux-gnu/c++/10/ -I/usr/include/c++/10 --cuda-path=/usr/local/cuda-11.5"
```

Many thanks to our IT administrator for helping us troubleshoot the compiler errors and CUDA packages.

The text was updated successfully, but these errors were encountered:

Description of changes: - restrict Clang-Tidy checks to the main project - external libraries obtained via FetchContent and their consumer targets in ESPResSo no longer emit diagnostics - use native CUDA support in CMake 3.22 - project option `ESPRESSO_CUDA_COMPILER` was removed - the waLBerla library obtained via FetchContent can now be compiled with `WALBERLA_BUILD_WITH_CUDA=ON` - the CUDA 11 circular dependency in Ubuntu 22.04 packages is now documented (closes #4630)

jngrad mentioned this issue Dec 28, 2022

Improve support for CUDA libraries and FetchContent #4642

Merged

kodiakhq bot closed this as completed in #4642 Jan 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using GCC and Clang with CUDA 11 on Ubuntu 22.04 #4630

Using GCC and Clang with CUDA 11 on Ubuntu 22.04 #4630

jngrad commented Dec 14, 2022

Using GCC and Clang with CUDA 11 on Ubuntu 22.04 #4630

Using GCC and Clang with CUDA 11 on Ubuntu 22.04 #4630

Comments

jngrad commented Dec 14, 2022

Problem statement

Things to know before attempting to solve these issues

How we solved these issues