Rotary Emb compile fail under Ubuntu 22.04 with gcc/g++ v12 installed #484

Qubitium · 2023-08-24T23:11:56Z

flash_attn core was compiled correctly but runtime error asks to compile rotary module for llama2. However, the compilation fails on Ubuntu 22.0 with cuda 12.1, pytorch nightly for 12.1 and gcc/g++ 12.

Thanks for any pointers. I am scratching my heads on this one.

Collecting git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
  Cloning https://github.com/HazyResearch/flash-attention.git to /tmp/pip-req-build-jtrc0_ne
  Running command git clone --filter=blob:none --quiet https://github.com/HazyResearch/flash-attention.git /tmp/pip-req-build-jtrc0_ne
  Resolved https://github.com/HazyResearch/flash-attention.git to commit 6711b3bc40073e7ced2a4c7d8266feec7e6e137f
  Running command git submodule update --init --recursive -q
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: rotary-emb
  Building wheel for rotary-emb (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [87 lines of output]


      torch.__version__  = 2.1.0.dev20230824+cu121


      running bdist_wheel
      running build
      running build_ext
      /usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no x86_64-linux-gnu-g++ version bounds defined for CUDA version 12.1
        warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
      building 'rotary_emb' extension
      creating /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build
      creating /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build/temp.linux-x86_64-3.10
      Emitting ninja build file /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build/temp.linux-x86_64-3.10/build.ninja...
      Compiling objects...
      Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
      [1/2] c++ -MMD -MF /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build/temp.linux-x86_64-3.10/rotary.o.d -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pip-req-build-jtrc0_ne/csrc/rotary/rotary.cpp -o /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build/temp.linux-x86_64-3.10/rotary.o -g -march=native -funroll-loops -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=rotary_emb -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
      [2/2] /usr/local/cuda/bin/nvcc  -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pip-req-build-jtrc0_ne/csrc/rotary/rotary_cuda.cu -o /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build/temp.linux-x86_64-3.10/rotary_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math --expt-extended-lambda -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=rotary_emb -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
      FAILED: /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build/temp.linux-x86_64-3.10/rotary_cuda.o
      /usr/local/cuda/bin/nvcc  -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pip-req-build-jtrc0_ne/csrc/rotary/rotary_cuda.cu -o /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build/temp.linux-x86_64-3.10/rotary_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math --expt-extended-lambda -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=rotary_emb -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
      /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/detail/../cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
      /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/detail/../cast.h:45:120: error: expected template-name before ‘<’ token
         45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
            |                                                                                                                        ^
      /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/detail/../cast.h:45:120: error: expected identifier before ‘<’ token
      /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/detail/../cast.h:45:123: error: expected primary-expression before ‘>’ token
         45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
            |                                                                                                                           ^
      /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/detail/../cast.h:45:126: error: expected primary-expression before ‘)’ token
         45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
            |                                                                                                                              ^
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build
          subprocess.run(
        File "/usr/lib/python3.10/subprocess.py", line 526, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-jtrc0_ne/csrc/rotary/setup.py", line 120, in <module>
          setup(
        File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 153, in setup
          return distutils.core.setup(**attrs)
        File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
          dist.run_commands()
        File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
          self.run_command(cmd)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 299, in run
          self.run_command('build')
        File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 79, in run
          _build_ext.run(self)
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
          self.build_extensions()
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 873, in build_extensions
          build_ext.build_extensions(self)
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions
          self._build_extensions_serial()
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial
          self.build_extension(ext)
        File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 202, in build_extension
          _build_ext.build_extension(self, ext)
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension
          objects = self.compiler.compile(sources,
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 686, in unix_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

The text was updated successfully, but these errors were encountered:

tridao · 2023-08-24T23:14:15Z

Yeah idk, maybe the compiler version is too new. It's erroring on some pybind11 code :D
You can try gcc 11 or lower.

Saimshaikh8297 · 2023-08-25T11:28:21Z

$ sudo apt install gcc-10 g++-10
$ export CC=/usr/bin/gcc-10
$ export CXX=/usr/bin/g++-10
$ export CUDA_ROOT=/usr/local/cuda
$ ln -s /usr/bin/gcc-10 $CUDA_ROOT/bin/gcc
$ ln -s /usr/bin/g++-10 $CUDA_ROOT/bin/g++

I installed gcc-10 as per NVlabs/instant-ngp#119.

This worked for me

andersonbcdefg · 2023-09-05T04:13:23Z

FYI, this happens on Lambda stack which means anyone trying to use rotary on Lambda H100s is in for a bad time. I tried the above ^ and it didn't fix the problem for me (the ln commands didn't work because stuff wasn't where they expected it to be).

Birch-san · 2023-09-05T15:30:30Z

Yes, CUDA 12.0 and 12.1's nvcc compiler, cannot compile pybind11 2.11.1:
pybind/pybind11#4606
Note: fixed in CUDA 12.2.

Specifically, it cannot compile pybind11/cast.h#L45.

The fix is simple (thanks @archibate):

-    return caster.operator typename make_caster<T>::template cast_op_type<T>();
+    return caster;

So, you just need to find the pybind11/cast.h used by your current Python environment, and modify it as above.
Check which file is cited in your compile error:

/home/birch/anaconda3/envs/p311-cu121-bnb-opt/lib/python3.11/site-packages/torch/include/pybind11/detail/../cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
/home/birch/anaconda3/envs/p311-cu121-bnb-opt/lib/python3.11/site-packages/torch/include/pybind11/detail/../cast.h:45:120: error: expected template-name before ‘<’ token
   45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();

In this example it's:
/home/birch/anaconda3/envs/p311-cu121-bnb-opt/lib/python3.11/site-packages/torch/include/pybind11/cast.h.

So, we modify this cast.h as above.

Then try compiling rotary-emb again:

MAX_JOBS=2 pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary

And voilà:

Successfully built rotary-emb

andersonbcdefg · 2023-09-05T17:48:08Z

Thanks for that! Is this something you'd expect one of pybind11 or CUDA to fix at some point?

brettbj · 2023-11-21T21:28:51Z

@Birch-san thank you! fixed it for me as well, really appreciate it

Birch-san · 2023-11-21T21:45:25Z

Thanks for that! Is this something you'd expect one of pybind11 or CUDA to fix at some point?

@andersonbcdefg yes, CUDA 12.2's nvcc compiler can now compile pybind11.

Riyme mentioned this issue Oct 29, 2024

command '/usr/local/cuda-12.1/bin/nvcc' failed with exit code 1 kakaxi314/BP-Net#11

Open

wuky2000 mentioned this issue Jan 22, 2025

someing wrong in running python setup.py install of "pybind11/detail/../cast.h...... error" kakaxi314/BP-Net#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rotary Emb compile fail under Ubuntu 22.04 with gcc/g++ v12 installed #484

Rotary Emb compile fail under Ubuntu 22.04 with gcc/g++ v12 installed #484

Qubitium commented Aug 24, 2023

tridao commented Aug 24, 2023

Saimshaikh8297 commented Aug 25, 2023

andersonbcdefg commented Sep 5, 2023

Birch-san commented Sep 5, 2023 •

edited

Loading

andersonbcdefg commented Sep 5, 2023

brettbj commented Nov 21, 2023

Birch-san commented Nov 21, 2023

Rotary Emb compile fail under Ubuntu 22.04 with gcc/g++ v12 installed #484

Rotary Emb compile fail under Ubuntu 22.04 with gcc/g++ v12 installed #484

Comments

Qubitium commented Aug 24, 2023

tridao commented Aug 24, 2023

Saimshaikh8297 commented Aug 25, 2023

andersonbcdefg commented Sep 5, 2023

Birch-san commented Sep 5, 2023 • edited Loading

andersonbcdefg commented Sep 5, 2023

brettbj commented Nov 21, 2023

Birch-san commented Nov 21, 2023

Birch-san commented Sep 5, 2023 •

edited

Loading