[BUG]: Build Error with CUDA 11.4-11.8.0 & Operators #4193

ax3l · 2022-09-21T00:28:12Z

Required prerequisites

Make sure you've read the documentation. Your issue may be addressed there.
Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
Consider asking first in the Gitter chat room or in a Discussion.

Problem description

I am compiling pybind11 v2.10.0-38-g424ac4fe on Perlmutter at NERSC.

I use the following software modules:

module load cmake/3.22.0
module load PrgEnv-gnu
module load cudatoolkit/11.7
module load cray-python/3.9.7.1

# compiler environment hints
export CRAY_ACCEL_TARGET=nvidia80
export CC=cc #$(which gcc)
export CXX=CC #$(which g++)
export FC=ftn # $(which gfortran)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=${CXX}

export CFLAGS="${CFLAGS} -O3 -ffast-math"
export CXXFLAGS="${CXXFLAGS} -O3 -ffast-math"
export FCLAGS="${FCFLAGS} -O3 -ffast-math"

$ CC --version
g++ (GCC) 11.2.0 20210728 (Cray Inc.)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ cc --version
gcc (GCC) 11.2.0 20210728 (Cray Inc.)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

Reproducible example code

cmake -S . -B build -DPYBIND11_CUDA_TESTS=ON -DPYBIND11_WERROR=ON -DDOWNLOAD_CATCH=ON

-- The CXX compiler identification is GNU 11.2.0
-- Cray Programming Environment 2.7.16 CXX
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/cray/pe/craype/2.7.16/bin/CC - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- pybind11 v2.11.0 dev1
-- CMake 3.22.0
-- Found PythonInterp: /usr/bin/python3.6 (found suitable version "3.6.15", minimum required is "3.6") 
-- Found PythonLibs: /usr/lib64/libpython3.6m.so
-- PYTHON 3.6.15
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- pybind11::lto enabled
-- pybind11::thin_lto enabled
-- Setting tests build type to MinSizeRel as none was specified
-- The CUDA compiler identification is NVIDIA 11.7.64
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Skipping test_constants_and_functions due to incompatible exception specifications
-- Building tests WITHOUT Eigen, use -DDOWNLOAD_EIGEN=ON on CMake 3.11+ to download
-- Found Boost: /usr/include (found suitable version "1.66.0", minimum required is "1.56")  
CMake Warning at tools/pybind11Common.cmake:227 (message):
  Missing: pytest 3.1

  Try: /usr/bin/python3.6 -m pip install pytest
Call Stack (most recent call first):
  tests/CMakeLists.txt:476 (pybind11_find_import)


-- Configuring done
-- Generating done
CMake Warning:
  Manually-specified variables were not used by the project:

    DOWNLOAD_CATCH


-- Build files have been written to: /global/homes/a/ahuebl/src/pybind11/build

cmake --build build

[  2%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/pybind11_tests.cpp.o
[  4%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_async.cpp.o
[  6%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_buffers.cpp.o
[  8%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_builtin_casters.cpp.o
[ 10%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_call_policies.cpp.o
[ 13%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_callbacks.cpp.o
[ 15%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_chrono.cpp.o
[ 17%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_class.cpp.o
[ 19%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_const_name.cpp.o
[ 21%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_copy_move.cpp.o
[ 23%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_custom_type_casters.cpp.o
[ 26%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_custom_type_setup.cpp.o
[ 28%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_docstring_options.cpp.o
[ 30%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_enum.cpp.o
[ 32%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_eval.cpp.o
[ 34%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_exceptions.cpp.o
[ 36%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_factory_constructors.cpp.o
[ 39%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_gil_scoped.cpp.o
[ 41%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_iostream.cpp.o
[ 43%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_kwargs_and_defaults.cpp.o
[ 45%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_local_bindings.cpp.o
[ 47%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_methods_and_attributes.cpp.o
[ 50%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_modules.cpp.o
[ 52%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_multiple_inheritance.cpp.o
[ 54%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_numpy_array.cpp.o
[ 56%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_numpy_dtypes.cpp.o
[ 58%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_numpy_vectorize.cpp.o
[ 60%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_opaque_types.cpp.o
[ 63%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_operator_overloading.cpp.o
/global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp: In function 'void test_submodule_operators(pybind11::module_&)':
/global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp:157:78: error: no matching function for call to 'pybind11::class_<Vector2>::def(pybind11::detail::op_<pybind11::detail::op_add, pybind11::detail::op_l, pybind11::detail::self_t, pybind11::detail::self_t>)'
  157 |     py::class_<Vector2>(m, "Vector2")
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                        ^                      
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1557:1: note: candidate: 'template<class Func, class ... Extra> pybind11::class_<type_, options>& pybind11::class_<type_, options>::def(const char*, Func&&, const Extra& ...) [with Func = Func; Extra = {Extra ...}; type_ = Vector2; options = {}]'
 1557 |     class_ &def(const char *name_, Func &&f, const Extra &...extra) {
      | ^  
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1557:1: note:   template argument deduction/substitution failed:
/global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp:157:78: note:   candidate expects at least 2 arguments, 1 provided
  157 |     py::class_<Vector2>(m, "Vector2")
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                        ^                      
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1582:1: note: candidate: 'template<pybind11::detail::op_id id, pybind11::detail::op_type ot, class L, class R, class ... Extra> pybind11::class_<type_, options>& pybind11::class_<type_, options>::def(const pybind11::detail::op_<(pybind11::detail::op_id)(id), (pybind11::detail::op_type)(ot), L, R>&, const Extra& ...) [with pybind11::detail::op_id id = id; pybind11::detail::op_type ot = ot; L = L; R = R; Extra = {Extra ...}; type_ = Vector2; options = {}]'
 1582 |     class_ &def(const detail::op_<id, ot, L, R> &op, const Extra &...extra) {
      | ^  
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1582:1: note:   template argument deduction/substitution failed:
/global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp:157:78: note:   couldn't deduce template parameter 'id'
  157 |     py::class_<Vector2>(m, "Vector2")
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                        ^                      
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1594:1: note: candidate: 'template<class ... Args, class ... Extra> pybind11::class_<type_, options>& pybind11::class_<type_, options>::def(const pybind11::detail::initimpl::constructor<Args ...>&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = Vector2; options = {}]'
 1594 |     class_ &def(const detail::initimpl::constructor<Args...> &init, const Extra &...extra) {
      | ^  
...

error.txt

More details

Failing compile line:

/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/opt/cray/pe/craype/2.7.16/bin/CC -DPYBIND11_TEST_BOOST -Dpybind11_tests_EXPORTS -I/global/homes/a/ahuebl/src/pybind11/include -isystem=/usr/include/python3.6m -O1 -DNDEBUG --generate-code=arch=compute_52,code=[compute_52,sm_52] -Xcompiler=-fPIC -Xcompiler=-fvisibility=hidden -Werror all-warnings -std=c++17 -MD -MT tests/CMakeFiles/pybind11_tests.dir/test_operator_overloading.cpp.o -MF CMakeFiles/pybind11_tests.dir/test_operator_overloading.cpp.o.d -x cu -c /global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp -o CMakeFiles/pybind11_tests.dir/test_operator_overloading.cpp.o

Pre-processed file from -E: test_operator_overloading.cpp.txt

Cross-References

NERSC ticket: INC0191398
Nvidia ticket: 3820295

The text was updated successfully, but these errors were encountered:

ax3l · 2022-09-21T01:18:22Z

Vanilla Nvidia Linux Docker with CTK 11.7.1:

$ docker run -it nvidia/cuda:11.7.1-devel-ubuntu20.04

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

$ g++ --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ apt update && apt install -y git cmake python3 python3-dev python3-setuptools

$ git clone https://github.com/pybind/pybind11.git
$ cmake -S pybind11 -B build -DPYBIND11_CUDA_TESTS=ON -DPYBIND11_WERROR=ON
$ cmake --build build

-> same issue.

Looks like it's not HPE specific but a general Nvidia NVCC issue.

henryiii · 2022-09-21T18:40:58Z

We have CI run for CUDA, right? Maybe we could add 11.7 & show it breaking in CI?

henryiii · 2022-09-21T18:51:04Z

Ahh, yes: #3968

ax3l · 2022-09-22T23:09:16Z

I repeated the compilation with the following docker containers - but with -DPYBIND11_WERROR=OFF.
Legend: ✔️ ok - ❌ fail

✔️ nvidia/cuda:11.2.2-devel-ubuntu20.04
✔️ nvidia/cuda:11.3.1-devel-ubuntu20.04
❌ nvidia/cuda:11.4.3-devel-ubuntu20.04
- -fpermissive errors in pybind11/stl_bind.h for pybind11_cross_module_tests.cpp
- type deduction errors for self & operators in pybind11_cross_module_tests.cpp (see PR description)
- type deduction errors for self & operators in test_operator_overloading.cpp (see PR description)
❌ nvidia/cuda:11.5.1-devel-ubuntu20.04
- -fpermissive errors in pybind11/stl_bind.h for pybind11_cross_module_tests.cpp
- type deduction errors in pybind11_cross_module_tests.cpp (see PR description)
- type deduction errors for self & operators in test_operator_overloading.cpp (see PR description)
❌ nvidia/cuda:11.6.1-devel-ubuntu20.04
- -fpermissive errors in pybind11/stl_bind.h for pybind11_cross_module_tests.cpp
- type deduction errors for self & operators in pybind11_cross_module_tests.cpp (see PR description)
- type deduction errors for self & operators in test_operator_overloading.cpp (see PR description)
❌ nvidia/cuda:11.7.1-devel-ubuntu20.04
- type deduction errors for self & operators in test_operator_overloading.cpp (see PR description)
- ...
❌ CUDA 11.8.0

ax3l · 2022-10-04T01:37:26Z

Uff, tried again today and still cannot find a simple work-around.

ax3l · 2022-10-06T18:13:02Z

Happy to report we made great progress on this with the help of Nvidia developers 🎉

✔️ fix landed in their development branches, just after the 11.8.0 CUDA Toolkit (CTK) release
✔️ Nvidia found a work-around that we can use in the meantime for CTK 11.4-11.8, e.g., as patch in package managers
🤞 due to its popularity, from SciPy to RAPIDS AI projects, we try to get pybind11 into the internal Nvidia compiler regression suite for nvcc

Issue Description from Nvidia

NVCC parses the input and regenerates host side C++ to send to the host compiler. There’s a bug in the host C++ generation, where the def function (and def_cast) get unnecessary casts added in the declaration of op parameter, i.e. the code sent to gcc is broken due to the extra casts ( ... ) inserted for the first two template args:

template <detail::op_id id, detail::op_type ot, typename L, typename R, typename... Extra>
class_ &def(const detail::op_<( detail::op_id )id, (detail::op_type)ot, L, R> &op, const Extra &...extra)

Work-Around for CTK 11.4-11.8

Replace the logic in
https://github.com/pybind/pybind11/blob/v2.10.0/include/pybind11/pybind11.h#L1581-L1591
with a more general pattern, such as:

template <typename T, typename... Extra>
class_ &def(const T  &op, const Extra &...extra)

For example:

diff --git a/include/pybind11/pybind11.h b/include/pybind11/pybind11.h
index c889dc41..43f4abc3 100644
--- a/include/pybind11/pybind11.h
+++ b/include/pybind11/pybind11.h
@@ -1578,14 +1578,14 @@ public:
         return *this;
     }
 
-    template <detail::op_id id, detail::op_type ot, typename L, typename R, typename... Extra>
-    class_ &def(const detail::op_<id, ot, L, R> &op, const Extra &...extra) {
+    template <typename T, typename... Extra>
+    class_ &def(const T  &op, const Extra &...extra) {
         op.execute(*this, extra...);
         return *this;
     }
 
-    template <detail::op_id id, detail::op_type ot, typename L, typename R, typename... Extra>
-    class_ &def_cast(const detail::op_<id, ot, L, R> &op, const Extra &...extra) {
+    template <typename T, typename... Extra>
+    class_ &def_cast(const T  &op, const Extra &...extra) {
         op.execute_cast(*this, extra...);
         return *this;
     }

This unbreaks the test suite for me 🎉 all runtime tests pass as well.

Due to the broad pattern, this is probably not suitable for mainline @henryiii @Skylion007?

But I think it is good enough to patch in package managers. Should we add an enable_if or so to make the template matching a bit more safe? (I think that compile-time does not matter for targeted patches in package managers, as long as it unbreaks the affected CTK compiler versions with narrow #ifdefs.)

henryiii · 2022-10-06T18:38:00Z

I think it's fine to patch it for a restricted range of compilers. nvcc 11.4 - 11.8.0? I'd like to avoid package managers patching pybind11 if possible.

Is this something that might land in 11.8.1 or is it 11.9+ only?

ax3l · 2022-10-06T18:49:41Z

Ok, sounds good.
Proposed in #4220

Is this something that might land in 11.8.1 or is it 11.9+ only?

I don't know, since these are internal roadmap details. I assume all following CUDA Toolkit releases after 11.8.0.

ax3l added bug triage New bug, unverified labels Sep 21, 2022

ax3l changed the title ~~[BUG]: HPE CUDA 11.7 & Operators~~ [BUG]: HPE CUDA 11.4-11.7 & Operators Sep 22, 2022

ax3l mentioned this issue Sep 22, 2022

ci: try CUDA 11.7 #3968

Closed

ax3l changed the title ~~[BUG]: HPE CUDA 11.4-11.7 & Operators~~ [BUG]: Build Error with CUDA 11.4-11.7 & Operators Sep 23, 2022

ax3l changed the title ~~[BUG]: Build Error with CUDA 11.4-11.7 & Operators~~ [BUG]: Build Error with CUDA 11.4-11.8.0 & Operators Oct 6, 2022

This was referenced Oct 6, 2022

Work-Around: NVCC 11.4.0 - 11.8.0 #4220

Merged

Array4: __cuda_array_interface__ v3 AMReX-Codes/pyamrex#30

Merged

henryiii closed this as completed in #4220 Oct 7, 2022

This was referenced Nov 1, 2022

pybind11: v2.10.1+ openPMD/openPMD-api#1322

Merged

pybind11: v2.10.1 AMReX-Codes/pyamrex#89

Merged

rwgk mentioned this issue Feb 11, 2023

FWD pybind11 google/pybind11clif#4193

Closed

ax3l mentioned this issue May 17, 2023

Add reduced diagnostics for beam ECP-WarpX/impactx#336

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Build Error with CUDA 11.4-11.8.0 & Operators #4193

[BUG]: Build Error with CUDA 11.4-11.8.0 & Operators #4193

ax3l commented Sep 21, 2022 •

edited

Loading

ax3l commented Sep 21, 2022 •

edited

Loading

henryiii commented Sep 21, 2022

henryiii commented Sep 21, 2022

ax3l commented Sep 22, 2022 •

edited

Loading

ax3l commented Oct 4, 2022

ax3l commented Oct 6, 2022

henryiii commented Oct 6, 2022 •

edited

Loading

ax3l commented Oct 6, 2022 •

edited

Loading

[BUG]: Build Error with CUDA 11.4-11.8.0 & Operators #4193

[BUG]: Build Error with CUDA 11.4-11.8.0 & Operators #4193

Comments

ax3l commented Sep 21, 2022 • edited Loading

Required prerequisites

Problem description

Reproducible example code

More details

Cross-References

ax3l commented Sep 21, 2022 • edited Loading

henryiii commented Sep 21, 2022

henryiii commented Sep 21, 2022

ax3l commented Sep 22, 2022 • edited Loading

ax3l commented Oct 4, 2022

ax3l commented Oct 6, 2022

Issue Description from Nvidia

Work-Around for CTK 11.4-11.8

henryiii commented Oct 6, 2022 • edited Loading

ax3l commented Oct 6, 2022 • edited Loading

ax3l commented Sep 21, 2022 •

edited

Loading

ax3l commented Sep 21, 2022 •

edited

Loading

ax3l commented Sep 22, 2022 •

edited

Loading

henryiii commented Oct 6, 2022 •

edited

Loading

ax3l commented Oct 6, 2022 •

edited

Loading