Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using GCC and Clang with CUDA 11 on Ubuntu 22.04 #4630

Closed
jngrad opened this issue Dec 14, 2022 · 0 comments · Fixed by #4642
Closed

Using GCC and Clang with CUDA 11 on Ubuntu 22.04 #4630

jngrad opened this issue Dec 14, 2022 · 0 comments · Fixed by #4642

Comments

@jngrad
Copy link
Member

jngrad commented Dec 14, 2022

Since the beginning of December 2022, it is no longer possible to build a CUDA-enabled ESPResSo project on Ubuntu 22.04, either via NVCC or Clang. Below we share our experience and the solution that worked for us.

Problem statement

Error 1: NVCC cannot compile a simple CUDA file

CMake Error at /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:726 (message):
  Compiling the CUDA compiler identification source file
  "CMakeCUDACompilerId.cu" failed.

  Compiler: /usr/local/cuda-11.2/bin/nvcc

    139 | #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag 
        | '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported
        | host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.

Error 2: Clang cannot find C++ standard headers like cmath or iostream

/usr/lib/llvm-14/lib/clang/14.0.0/include/__clang_cuda_runtime_wrapper.h:41:10: fatal error: 'cmath' file not found
#include <cmath>
         ^~~~~~~
1 error generated when compiling for sm_61

Error 3: CMake fails to detect MPI libraries when Clang is the compiler.

-- Found MPI_C: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so (found suitable version "3.1") 
-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS) (Required is at least version "3.0")
CMake Error at /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_CXX_FOUND) (found suitable version "3.1",
  minimum required is "3.0")
Call Stack (most recent call first):
  /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake-3.22/Modules/FindMPI.cmake:1830 (find_package_handle_standard_args)
  CMakeLists.txt:323 (find_package)

Error 4: The nvidia-cuda-toolkit cannot be installed together with nvidia-driver-515.

$ sudo apt install nvidia-cuda-toolkit
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages will be REMOVED:
  libnvidia-compute-515 libnvidia-compute-515:i386 libnvidia-decode-515 libnvidia-decode-515:i386
  libnvidia-encode-515 libnvidia-encode-515:i386 nvidia-compute-utils-515 nvidia-driver-515 nvidia-utils-515
The following NEW packages will be installed:
  libnvidia-compute-495 libnvidia-compute-510 nvidia-cuda-dev nvidia-cuda-gdb nvidia-cuda-toolkit [...]
Do you want to continue? [Y/n] n

$ sudo apt install nvidia-cuda-toolkit nvidia-driver-515
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
nvidia-driver-515 is already the newest version (515.86.01-0ubuntu0.22.04.1).
Some packages could not be installed. This may mean that you have requested an impossible situation
or if you are using the unstable distribution that some required packages have not yet been created
or been moved out of Incoming. The following information may help to resolve the situation:
The following packages have unmet dependencies:
 libnvidia-decode-515 : Depends: libnvidia-compute-515 (= 515.86.01-0ubuntu0.22.04.1) but it is not installable
 nvidia-compute-utils-515 : Depends: libnvidia-compute-515 but it is not installable
 nvidia-driver-515 : Depends: libnvidia-compute-515 (= 515.86.01-0ubuntu0.22.04.1) but it is not installable
                     Recommends: libnvidia-compute-515:i386 (= 515.86.01-0ubuntu0.22.04.1)
                     Recommends: libnvidia-decode-515:i386 (= 515.86.01-0ubuntu0.22.04.1)
                     Recommends: libnvidia-encode-515:i386 (= 515.86.01-0ubuntu0.22.04.1)
 nvidia-utils-515 : Depends: libnvidia-compute-515 but it is not installable
E: Unable to correct problems, you have held broken packages.

Things to know before attempting to solve these issues

  • CUDA 11 requires GCC <= 10.
  • When Clang 14 compiles CUDA sources, it automatically selects the most recent GCC version, i.e. GCC 12 on Ubuntu 22.04.
  • GCC 12 cannot be removed: on Ubuntu 22.04, the apt package manager must satisfy the following chain of hard dependencies: nvidia-driver-515 -> nvidia-dkms-515 -> dkms -> gcc-12. Even though the dkms page indicates a dependency on any gcc or c-compiler version, only gcc-12 satisfies that dependency from the point of view of apt (as of December 13, 2022), in spite of the fact that nvidia-driver-515 is in reality compatible with nvidia-cuda-toolkit.
  • Inside an Ubuntu 22.04 Docker container with nvidia-cuda-toolkit installed and no NVIDIA driver installed, as long as the container was started with docker run --runtime=nvidia, the NVIDIA driver of the host will be used and the NVIDIA toolkit will work as expected (no issues with CMake, NVCC, GCC or Clang).
  • Posts on StackOverflow suggest to create a new folder containing symlinks to GCC 10 include and lib directories and pass that folder via --gcc-toolchain=my_folder to restrict GCC version detection to GCC 10, unfortunately the C++ headers files are actually split in two separate directories, so that workaround no longer works.

How we solved these issues

  • To use Clang 14, it is necessary to install a complete GCC 10 toolchain.
    • do apt install gcc-10 g++-10 libstdc++-10-dev
    • pass extra flags to make Clang traverse the GCC 10 header files before traversing the GCC 12 header files:
    CC=clang-14 CXX=clang++-14 CUDACXX=clang++-14 cmake .. \
      -D CMAKE_CXX_FLAGS="-I/usr/include/x86_64-linux-gnu/c++/10 -I/usr/include/c++/10" \
      -D CMAKE_CUDA_FLAGS="-I/usr/include/x86_64-linux-gnu/c++/10 -I/usr/include/c++/10"
  • To use the toolkit with the 515 driver:
    • remove nvidia-cuda-toolkit
    • install nvidia-driver-515
    • manually install a CUDA 11.x toolkit in a folder like /usr/local/cuda-11.5
      • download the desired runfile, even if it's only officially supported on Ubuntu 20.04, e.g. cuda 11.5.2
      • run sh cuda_11.5.2_495.29.05_linux.run as a regular user
      • tick Continue and accept the EULA to go to CUDA Installer
      • untick everything
      • tick CUDA Toolkit 11.5
      • open Options
      • open Toolkit Options
      • untick all
      • open Change Toolkit Install Path
      • type /usr/local/cuda-11.5/
      • go back three levels with Done
      • hit Install
    • adapt the compiler commands via CMake options:
    # case 1: GCC + NVCC
    CC=gcc-12 CXX=g++-12 CUDACXX=/usr/local/cuda-11.5/bin/nvcc cmake .. \
      -D CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.5 \
      -D CMAKE_CUDA_FLAGS="--compiler-bindir=/usr/bin/g++-10"
    
    # case 2: Clang only
    CC=clang-14 CXX=clang++-14 CUDACXX=clang++-14 cmake .. \
      -D CMAKE_CXX_FLAGS="-I/usr/include/x86_64-linux-gnu/c++/10/ -I/usr/include/c++/10 --cuda-path=/usr/local/cuda-11.5" \
      -D CMAKE_CUDA_FLAGS="-I/usr/include/x86_64-linux-gnu/c++/10/ -I/usr/include/c++/10 --cuda-path=/usr/local/cuda-11.5"

Many thanks to our IT administrator for helping us troubleshoot the compiler errors and CUDA packages.

@kodiakhq kodiakhq bot closed this as completed in #4642 Jan 9, 2023
kodiakhq bot added a commit that referenced this issue Jan 9, 2023
Description of changes:
- restrict Clang-Tidy checks to the main project
   - external libraries obtained via FetchContent and their consumer targets in ESPResSo no longer emit diagnostics
- use native CUDA support in CMake 3.22
   - project option `ESPRESSO_CUDA_COMPILER` was removed
   - the waLBerla library obtained via FetchContent can now be compiled with `WALBERLA_BUILD_WITH_CUDA=ON`
   - the CUDA 11 circular dependency in Ubuntu 22.04 packages is now documented (closes #4630)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant