Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flashv2] Windows support #345

Open
danthe3rd opened this issue Jul 19, 2023 · 11 comments
Open

[Flashv2] Windows support #345

danthe3rd opened this issue Jul 19, 2023 · 11 comments

Comments

@danthe3rd
Copy link
Contributor

danthe3rd commented Jul 19, 2023

Flashv2 is based on CUTLASS v3, which does not support windows at the moment.
I have a bunch of errors - I'll try to dig a bit further, might be related to MSVC version, will report if I manage to get it to work.

Would be curious if anyone got it to work there (I don't have a windows machine to test things out)

@mnicely
Copy link

mnicely commented Jul 19, 2023

CUTLASS is planning to have official Windows support later this year

@drisspg
Copy link
Contributor

drisspg commented Aug 8, 2023

Quick FYI: Cutlass 3.2 has a bullet point on support for windows build, I haven't had a chance to try building flash with 3.2 but at least for 3.1 still seeing errors around some of the copy utils.

@tridao
Copy link
Contributor

tridao commented Aug 8, 2023

Quick FYI: Cutlass 3.2 has a bullet point on support for windows build, I haven't had a chance to try building flash with 3.2 but at least for 3.1 still seeing errors around some of the copy utils.

Awesome! I'm hoping to find time to try Cutlass 3.2 soon!

@Panchovix
Copy link

Panchovix commented Sep 16, 2023

I have tried with CUDA 12.1 on Windows 11, VS 2022 and no luck so far to build FAv2.

@danthe3rd have you found something?

@mnicely
Copy link

mnicely commented Sep 18, 2023

@Panchovix do you know what version of CUTLASS you used?

@Panchovix
Copy link

@Panchovix do you know what version of CUTLASS you used?

Sadly I don't know how to check. I just installed CUDA 11.8/12.1 independently, assigned each env variable for each torch version, and when trying to install, it fail with this, based if compiling from source or installing.

From source:

Compiling from source error
[2/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim160_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_fp16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_00003160_00000000-7_flash_bwd_hdim160_fp16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim160<cutlass::half_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[3/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim160_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_bf16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_00005ccc_00000000-7_flash_bwd_hdim160_bf16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim160<cutlass::bfloat16_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[4/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim192_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_bf16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_000038c0_00000000-7_flash_bwd_hdim192_bf16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim192<cutlass::bfloat16_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[5/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim192_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_fp16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_00002c68_00000000-7_flash_bwd_hdim192_fp16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim192<cutlass::half_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[6/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim128_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_fp16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_000030a8_00000000-7_flash_bwd_hdim128_fp16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim128<cutlass::half_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[7/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim128_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_bf16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_0000556c_00000000-7_flash_bwd_hdim128_bf16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim128<cutlass::bfloat16_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2100, in _run_ninja_build
    subprocess.run(
  File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '6']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "F:\ChatIAs\oobabooga\flash-attention\setup.py", line 287, in <module>
    setup(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
    return run_commands(dist)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
    dist.run_commands()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 968, in run_commands
    self.run_command(cmd)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install.py", line 74, in run
    self.do_egg_install()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install.py", line 123, in do_egg_install
    self.run_command('bdist_egg')
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\bdist_egg.py", line 165, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\bdist_egg.py", line 151, in call_command
    self.run_command(cmdname)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install_lib.py", line 11, in run
    self.build()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\install_lib.py", line 112, in build
    self.run_command('build_ext')
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 84, in run
    _build_ext.run(self)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 346, in run
    self.build_extensions()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 873, in build_extensions
    build_ext.build_extensions(self)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 466, in build_extensions
    self._build_extensions_serial()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 492, in _build_extensions_serial
    self.build_extension(ext)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
    _build_ext.build_extension(self, ext)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 547, in build_extension
    objects = self.compiler.compile(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 845, in win_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2116, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
Install from pip error
ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 257, in run
          urllib.request.urlretrieve(wheel_url, wheel_filename)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 241, in urlretrieve
          with contextlib.closing(urlopen(url, data)) as fp:
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 216, in urlopen
          return opener.open(url, data, timeout)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 525, in open
          response = meth(req, response)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 634, in http_response
          response = self.parent.error(
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 563, in error
          return self._call_chain(*args)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 496, in _call_chain
          result = func(*args)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 643, in http_error_default
          raise HTTPError(req.full_url, code, msg, hdrs, fp)
      urllib.error.HTTPError: HTTP Error 404: Not Found

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2100, in _run_ninja_build
          subprocess.run(
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '6']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 277, in <module>
          setup(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\__init__.py", line 87, in setup
          return distutils.core.setup(**attrs)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
          return run_commands(dist)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
          dist.run_commands()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 968, in run_commands
          self.run_command(cmd)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
          super().run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
          cmd_obj.run()
        File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 274, in run
          super().run()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\wheel\bdist_wheel.py", line 343, in run
          self.run_command("build")
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
          self.distribution.run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
          super().run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
          cmd_obj.run()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build.py", line 132, in run
          self.run_command(cmd_name)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
          self.distribution.run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
          super().run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
          cmd_obj.run()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 84, in run
          _build_ext.run(self)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 346, in run
          self.build_extensions()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 873, in build_extensions
          build_ext.build_extensions(self)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 466, in build_extensions
          self._build_extensions_serial()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 492, in _build_extensions_serial
          self.build_extension(ext)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
          _build_ext.build_extension(self, ext)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 547, in build_extension
          objects = self.compiler.compile(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 845, in win_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2116, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for flash-attn
  Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

Panchovix referenced this issue in oobabooga/text-generation-webui Sep 18, 2023
@grimulkan grimulkan mentioned this issue Sep 22, 2023
@grimulkan
Copy link

Sadly I don't know how to check. I just installed CUDA 11.8/12.1 independently, assigned each env variable for each torch version, and when trying to install, it fail with this, based if compiling from source or installing.

Technically this repo should pull cutlass @ 34fd980 if you cloned it recently, which I think should have been 3.2

@Panchovix
Copy link

Sadly I don't know how to check. I just installed CUDA 11.8/12.1 independently, assigned each env variable for each torch version, and when trying to install, it fail with this, based if compiling from source or installing.

Technically this repo should pull cutlass @ 34fd980 if you cloned it recently, which I think should have been 3.2

I have cloned it every day I think, but the issue persists, so I'm not sure what would be happening. The error log is not that conclusive for me.

@BadisG
Copy link

BadisG commented Sep 23, 2023

If someone found the fix, please tell us how to make it work :(

@grimulkan
Copy link

Interestingly flash-attn v1 does build on Windows, as pointed out on Reddit

@fenglui
Copy link

fenglui commented Oct 8, 2023

https://github.com/NVIDIA/cutlass/blob/ff02da266713bd3365aed65c552412e126c040cb/media/docs/build/building_in_windows_with_visual_studio.md?plain=1#L4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants