Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Build Error with CUDA 11.4-11.8.0 & Operators #4193

Closed
3 tasks done
ax3l opened this issue Sep 21, 2022 · 8 comments · Fixed by #4220
Closed
3 tasks done

[BUG]: Build Error with CUDA 11.4-11.8.0 & Operators #4193

ax3l opened this issue Sep 21, 2022 · 8 comments · Fixed by #4220
Labels
bug triage New bug, unverified

Comments

@ax3l
Copy link
Collaborator

ax3l commented Sep 21, 2022

Required prerequisites

Problem description

I am compiling pybind11 v2.10.0-38-g424ac4fe on Perlmutter at NERSC.

I use the following software modules:

module load cmake/3.22.0
module load PrgEnv-gnu
module load cudatoolkit/11.7
module load cray-python/3.9.7.1

# compiler environment hints
export CRAY_ACCEL_TARGET=nvidia80
export CC=cc #$(which gcc)
export CXX=CC #$(which g++)
export FC=ftn # $(which gfortran)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=${CXX}

export CFLAGS="${CFLAGS} -O3 -ffast-math"
export CXXFLAGS="${CXXFLAGS} -O3 -ffast-math"
export FCLAGS="${FCFLAGS} -O3 -ffast-math"
$ CC --version
g++ (GCC) 11.2.0 20210728 (Cray Inc.)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ cc --version
gcc (GCC) 11.2.0 20210728 (Cray Inc.)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

Reproducible example code

cmake -S . -B build -DPYBIND11_CUDA_TESTS=ON -DPYBIND11_WERROR=ON -DDOWNLOAD_CATCH=ON
-- The CXX compiler identification is GNU 11.2.0
-- Cray Programming Environment 2.7.16 CXX
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/cray/pe/craype/2.7.16/bin/CC - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- pybind11 v2.11.0 dev1
-- CMake 3.22.0
-- Found PythonInterp: /usr/bin/python3.6 (found suitable version "3.6.15", minimum required is "3.6") 
-- Found PythonLibs: /usr/lib64/libpython3.6m.so
-- PYTHON 3.6.15
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- pybind11::lto enabled
-- pybind11::thin_lto enabled
-- Setting tests build type to MinSizeRel as none was specified
-- The CUDA compiler identification is NVIDIA 11.7.64
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Skipping test_constants_and_functions due to incompatible exception specifications
-- Building tests WITHOUT Eigen, use -DDOWNLOAD_EIGEN=ON on CMake 3.11+ to download
-- Found Boost: /usr/include (found suitable version "1.66.0", minimum required is "1.56")  
CMake Warning at tools/pybind11Common.cmake:227 (message):
  Missing: pytest 3.1

  Try: /usr/bin/python3.6 -m pip install pytest
Call Stack (most recent call first):
  tests/CMakeLists.txt:476 (pybind11_find_import)


-- Configuring done
-- Generating done
CMake Warning:
  Manually-specified variables were not used by the project:

    DOWNLOAD_CATCH


-- Build files have been written to: /global/homes/a/ahuebl/src/pybind11/build
cmake --build build
[  2%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/pybind11_tests.cpp.o
[  4%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_async.cpp.o
[  6%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_buffers.cpp.o
[  8%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_builtin_casters.cpp.o
[ 10%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_call_policies.cpp.o
[ 13%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_callbacks.cpp.o
[ 15%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_chrono.cpp.o
[ 17%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_class.cpp.o
[ 19%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_const_name.cpp.o
[ 21%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_copy_move.cpp.o
[ 23%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_custom_type_casters.cpp.o
[ 26%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_custom_type_setup.cpp.o
[ 28%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_docstring_options.cpp.o
[ 30%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_enum.cpp.o
[ 32%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_eval.cpp.o
[ 34%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_exceptions.cpp.o
[ 36%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_factory_constructors.cpp.o
[ 39%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_gil_scoped.cpp.o
[ 41%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_iostream.cpp.o
[ 43%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_kwargs_and_defaults.cpp.o
[ 45%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_local_bindings.cpp.o
[ 47%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_methods_and_attributes.cpp.o
[ 50%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_modules.cpp.o
[ 52%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_multiple_inheritance.cpp.o
[ 54%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_numpy_array.cpp.o
[ 56%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_numpy_dtypes.cpp.o
[ 58%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_numpy_vectorize.cpp.o
[ 60%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_opaque_types.cpp.o
[ 63%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_operator_overloading.cpp.o
/global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp: In function 'void test_submodule_operators(pybind11::module_&)':
/global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp:157:78: error: no matching function for call to 'pybind11::class_<Vector2>::def(pybind11::detail::op_<pybind11::detail::op_add, pybind11::detail::op_l, pybind11::detail::self_t, pybind11::detail::self_t>)'
  157 |     py::class_<Vector2>(m, "Vector2")
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                        ^                      
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1557:1: note: candidate: 'template<class Func, class ... Extra> pybind11::class_<type_, options>& pybind11::class_<type_, options>::def(const char*, Func&&, const Extra& ...) [with Func = Func; Extra = {Extra ...}; type_ = Vector2; options = {}]'
 1557 |     class_ &def(const char *name_, Func &&f, const Extra &...extra) {
      | ^  
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1557:1: note:   template argument deduction/substitution failed:
/global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp:157:78: note:   candidate expects at least 2 arguments, 1 provided
  157 |     py::class_<Vector2>(m, "Vector2")
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                        ^                      
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1582:1: note: candidate: 'template<pybind11::detail::op_id id, pybind11::detail::op_type ot, class L, class R, class ... Extra> pybind11::class_<type_, options>& pybind11::class_<type_, options>::def(const pybind11::detail::op_<(pybind11::detail::op_id)(id), (pybind11::detail::op_type)(ot), L, R>&, const Extra& ...) [with pybind11::detail::op_id id = id; pybind11::detail::op_type ot = ot; L = L; R = R; Extra = {Extra ...}; type_ = Vector2; options = {}]'
 1582 |     class_ &def(const detail::op_<id, ot, L, R> &op, const Extra &...extra) {
      | ^  
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1582:1: note:   template argument deduction/substitution failed:
/global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp:157:78: note:   couldn't deduce template parameter 'id'
  157 |     py::class_<Vector2>(m, "Vector2")
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                        ^                      
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1594:1: note: candidate: 'template<class ... Args, class ... Extra> pybind11::class_<type_, options>& pybind11::class_<type_, options>::def(const pybind11::detail::initimpl::constructor<Args ...>&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = Vector2; options = {}]'
 1594 |     class_ &def(const detail::initimpl::constructor<Args...> &init, const Extra &...extra) {
      | ^  
...

error.txt

More details

Failing compile line:

/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/opt/cray/pe/craype/2.7.16/bin/CC -DPYBIND11_TEST_BOOST -Dpybind11_tests_EXPORTS -I/global/homes/a/ahuebl/src/pybind11/include -isystem=/usr/include/python3.6m -O1 -DNDEBUG --generate-code=arch=compute_52,code=[compute_52,sm_52] -Xcompiler=-fPIC -Xcompiler=-fvisibility=hidden -Werror all-warnings -std=c++17 -MD -MT tests/CMakeFiles/pybind11_tests.dir/test_operator_overloading.cpp.o -MF CMakeFiles/pybind11_tests.dir/test_operator_overloading.cpp.o.d -x cu -c /global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp -o CMakeFiles/pybind11_tests.dir/test_operator_overloading.cpp.o

Pre-processed file from -E: test_operator_overloading.cpp.txt

Cross-References

NERSC ticket: INC0191398
Nvidia ticket: 3820295

@ax3l ax3l added bug triage New bug, unverified labels Sep 21, 2022
@ax3l
Copy link
Collaborator Author

ax3l commented Sep 21, 2022

Vanilla Nvidia Linux Docker with CTK 11.7.1:

$ docker run -it nvidia/cuda:11.7.1-devel-ubuntu20.04
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

$ g++ --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ apt update && apt install -y git cmake python3 python3-dev python3-setuptools

$ git clone https://github.com/pybind/pybind11.git
$ cmake -S pybind11 -B build -DPYBIND11_CUDA_TESTS=ON -DPYBIND11_WERROR=ON
$ cmake --build build

-> same issue.

Looks like it's not HPE specific but a general Nvidia NVCC issue.

@henryiii
Copy link
Collaborator

We have CI run for CUDA, right? Maybe we could add 11.7 & show it breaking in CI?

@henryiii
Copy link
Collaborator

Ahh, yes: #3968

@ax3l
Copy link
Collaborator Author

ax3l commented Sep 22, 2022

I repeated the compilation with the following docker containers - but with -DPYBIND11_WERROR=OFF.
Legend: ✔️ ok - ❌ fail

  • ✔️ nvidia/cuda:11.2.2-devel-ubuntu20.04
  • ✔️ nvidia/cuda:11.3.1-devel-ubuntu20.04
  • nvidia/cuda:11.4.3-devel-ubuntu20.04
    • -fpermissive errors in pybind11/stl_bind.h for pybind11_cross_module_tests.cpp
    • type deduction errors for self & operators in pybind11_cross_module_tests.cpp (see PR description)
    • type deduction errors for self & operators in test_operator_overloading.cpp (see PR description)
  • nvidia/cuda:11.5.1-devel-ubuntu20.04
    • -fpermissive errors in pybind11/stl_bind.h for pybind11_cross_module_tests.cpp
    • type deduction errors in pybind11_cross_module_tests.cpp (see PR description)
    • type deduction errors for self & operators in test_operator_overloading.cpp (see PR description)
  • nvidia/cuda:11.6.1-devel-ubuntu20.04
    • -fpermissive errors in pybind11/stl_bind.h for pybind11_cross_module_tests.cpp
    • type deduction errors for self & operators in pybind11_cross_module_tests.cpp (see PR description)
    • type deduction errors for self & operators in test_operator_overloading.cpp (see PR description)
  • nvidia/cuda:11.7.1-devel-ubuntu20.04
    • type deduction errors for self & operators in test_operator_overloading.cpp (see PR description)
    • ...
  • ❌ CUDA 11.8.0

@ax3l ax3l changed the title [BUG]: HPE CUDA 11.7 & Operators [BUG]: HPE CUDA 11.4-11.7 & Operators Sep 22, 2022
@ax3l ax3l changed the title [BUG]: HPE CUDA 11.4-11.7 & Operators [BUG]: Build Error with CUDA 11.4-11.7 & Operators Sep 23, 2022
@ax3l
Copy link
Collaborator Author

ax3l commented Oct 4, 2022

Uff, tried again today and still cannot find a simple work-around.

@ax3l
Copy link
Collaborator Author

ax3l commented Oct 6, 2022

Happy to report we made great progress on this with the help of Nvidia developers 🎉

  • ✔️ fix landed in their development branches, just after the 11.8.0 CUDA Toolkit (CTK) release
  • ✔️ Nvidia found a work-around that we can use in the meantime for CTK 11.4-11.8, e.g., as patch in package managers
  • 🤞 due to its popularity, from SciPy to RAPIDS AI projects, we try to get pybind11 into the internal Nvidia compiler regression suite for nvcc

Issue Description from Nvidia

NVCC parses the input and regenerates host side C++ to send to the host compiler. There’s a bug in the host C++ generation, where the def function (and def_cast) get unnecessary casts added in the declaration of op parameter, i.e. the code sent to gcc is broken due to the extra casts ( ... ) inserted for the first two template args:

template <detail::op_id id, detail::op_type ot, typename L, typename R, typename... Extra>
class_ &def(const detail::op_<( detail::op_id )id, (detail::op_type)ot, L, R> &op, const Extra &...extra)

Work-Around for CTK 11.4-11.8

Replace the logic in
https://github.com/pybind/pybind11/blob/v2.10.0/include/pybind11/pybind11.h#L1581-L1591
with a more general pattern, such as:

template <typename T, typename... Extra>
class_ &def(const T  &op, const Extra &...extra)

For example:

diff --git a/include/pybind11/pybind11.h b/include/pybind11/pybind11.h
index c889dc41..43f4abc3 100644
--- a/include/pybind11/pybind11.h
+++ b/include/pybind11/pybind11.h
@@ -1578,14 +1578,14 @@ public:
         return *this;
     }
 
-    template <detail::op_id id, detail::op_type ot, typename L, typename R, typename... Extra>
-    class_ &def(const detail::op_<id, ot, L, R> &op, const Extra &...extra) {
+    template <typename T, typename... Extra>
+    class_ &def(const T  &op, const Extra &...extra) {
         op.execute(*this, extra...);
         return *this;
     }
 
-    template <detail::op_id id, detail::op_type ot, typename L, typename R, typename... Extra>
-    class_ &def_cast(const detail::op_<id, ot, L, R> &op, const Extra &...extra) {
+    template <typename T, typename... Extra>
+    class_ &def_cast(const T  &op, const Extra &...extra) {
         op.execute_cast(*this, extra...);
         return *this;
     }

This unbreaks the test suite for me 🎉 all runtime tests pass as well.

Due to the broad pattern, this is probably not suitable for mainline @henryiii @Skylion007?

But I think it is good enough to patch in package managers. Should we add an enable_if or so to make the template matching a bit more safe? (I think that compile-time does not matter for targeted patches in package managers, as long as it unbreaks the affected CTK compiler versions with narrow #ifdefs.)

@ax3l ax3l changed the title [BUG]: Build Error with CUDA 11.4-11.7 & Operators [BUG]: Build Error with CUDA 11.4-11.8.0 & Operators Oct 6, 2022
@henryiii
Copy link
Collaborator

henryiii commented Oct 6, 2022

I think it's fine to patch it for a restricted range of compilers. nvcc 11.4 - 11.8.0? I'd like to avoid package managers patching pybind11 if possible.

Is this something that might land in 11.8.1 or is it 11.9+ only?

@ax3l
Copy link
Collaborator Author

ax3l commented Oct 6, 2022

Ok, sounds good.
Proposed in #4220

Is this something that might land in 11.8.1 or is it 11.9+ only?

I don't know, since these are internal roadmap details. I assume all following CUDA Toolkit releases after 11.8.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug triage New bug, unverified
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants