[Flashv2] Windows support #345

danthe3rd · 2023-07-19T12:08:19Z

Flashv2 is based on CUTLASS v3, which does not support windows at the moment.
I have a bunch of errors - I'll try to dig a bit further, might be related to MSVC version, will report if I manage to get it to work.

Would be curious if anyone got it to work there (I don't have a windows machine to test things out)

mnicely · 2023-07-19T13:38:47Z

CUTLASS is planning to have official Windows support later this year

drisspg · 2023-08-08T18:45:04Z

Quick FYI: Cutlass 3.2 has a bullet point on support for windows build, I haven't had a chance to try building flash with 3.2 but at least for 3.1 still seeing errors around some of the copy utils.

tridao · 2023-08-08T18:46:27Z

Quick FYI: Cutlass 3.2 has a bullet point on support for windows build, I haven't had a chance to try building flash with 3.2 but at least for 3.1 still seeing errors around some of the copy utils.

Awesome! I'm hoping to find time to try Cutlass 3.2 soon!

Panchovix · 2023-09-16T23:15:26Z

I have tried with CUDA 12.1 on Windows 11, VS 2022 and no luck so far to build FAv2.

@danthe3rd have you found something?

mnicely · 2023-09-18T14:30:47Z

@Panchovix do you know what version of CUTLASS you used?

Panchovix · 2023-09-18T23:06:23Z

@Panchovix do you know what version of CUTLASS you used?

Sadly I don't know how to check. I just installed CUDA 11.8/12.1 independently, assigned each env variable for each torch version, and when trying to install, it fail with this, based if compiling from source or installing.

From source:

Compiling from source error

[2/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim160_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_fp16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_00003160_00000000-7_flash_bwd_hdim160_fp16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim160<cutlass::half_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[3/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim160_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_bf16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_00005ccc_00000000-7_flash_bwd_hdim160_bf16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim160<cutlass::bfloat16_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[4/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim192_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_bf16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_000038c0_00000000-7_flash_bwd_hdim192_bf16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim192<cutlass::bfloat16_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[5/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim192_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_fp16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_00002c68_00000000-7_flash_bwd_hdim192_fp16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim192<cutlass::half_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[6/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim128_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_fp16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_000030a8_00000000-7_flash_bwd_hdim128_fp16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim128<cutlass::half_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[7/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim128_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_bf16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_0000556c_00000000-7_flash_bwd_hdim128_bf16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim128<cutlass::bfloat16_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2100, in _run_ninja_build
    subprocess.run(
  File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '6']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "F:\ChatIAs\oobabooga\flash-attention\setup.py", line 287, in <module>
    setup(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
    return run_commands(dist)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
    dist.run_commands()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 968, in run_commands
    self.run_command(cmd)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install.py", line 74, in run
    self.do_egg_install()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install.py", line 123, in do_egg_install
    self.run_command('bdist_egg')
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\bdist_egg.py", line 165, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\bdist_egg.py", line 151, in call_command
    self.run_command(cmdname)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install_lib.py", line 11, in run
    self.build()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\install_lib.py", line 112, in build
    self.run_command('build_ext')
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 84, in run
    _build_ext.run(self)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 346, in run
    self.build_extensions()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 873, in build_extensions
    build_ext.build_extensions(self)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 466, in build_extensions
    self._build_extensions_serial()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 492, in _build_extensions_serial
    self.build_extension(ext)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
    _build_ext.build_extension(self, ext)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 547, in build_extension
    objects = self.compiler.compile(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 845, in win_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2116, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

Install from pip error

ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 257, in run
          urllib.request.urlretrieve(wheel_url, wheel_filename)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 241, in urlretrieve
          with contextlib.closing(urlopen(url, data)) as fp:
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 216, in urlopen
          return opener.open(url, data, timeout)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 525, in open
          response = meth(req, response)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 634, in http_response
          response = self.parent.error(
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 563, in error
          return self._call_chain(*args)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 496, in _call_chain
          result = func(*args)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 643, in http_error_default
          raise HTTPError(req.full_url, code, msg, hdrs, fp)
      urllib.error.HTTPError: HTTP Error 404: Not Found

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2100, in _run_ninja_build
          subprocess.run(
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '6']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 277, in <module>
          setup(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\__init__.py", line 87, in setup
          return distutils.core.setup(**attrs)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
          return run_commands(dist)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
          dist.run_commands()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 968, in run_commands
          self.run_command(cmd)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
          super().run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
          cmd_obj.run()
        File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 274, in run
          super().run()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\wheel\bdist_wheel.py", line 343, in run
          self.run_command("build")
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
          self.distribution.run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
          super().run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
          cmd_obj.run()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build.py", line 132, in run
          self.run_command(cmd_name)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
          self.distribution.run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
          super().run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
          cmd_obj.run()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 84, in run
          _build_ext.run(self)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 346, in run
          self.build_extensions()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 873, in build_extensions
          build_ext.build_extensions(self)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 466, in build_extensions
          self._build_extensions_serial()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 492, in _build_extensions_serial
          self.build_extension(ext)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
          _build_ext.build_extension(self, ext)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 547, in build_extension
          objects = self.compiler.compile(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 845, in win_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2116, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for flash-attn
  Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

grimulkan · 2023-09-22T18:58:05Z

Sadly I don't know how to check. I just installed CUDA 11.8/12.1 independently, assigned each env variable for each torch version, and when trying to install, it fail with this, based if compiling from source or installing.

Technically this repo should pull cutlass @ 34fd980 if you cloned it recently, which I think should have been 3.2

Panchovix · 2023-09-22T19:27:50Z

Sadly I don't know how to check. I just installed CUDA 11.8/12.1 independently, assigned each env variable for each torch version, and when trying to install, it fail with this, based if compiling from source or installing.

Technically this repo should pull cutlass @ 34fd980 if you cloned it recently, which I think should have been 3.2

I have cloned it every day I think, but the issue persists, so I'm not sure what would be happening. The error log is not that conclusive for me.

BadisG · 2023-09-23T07:01:56Z

If someone found the fix, please tell us how to make it work :(

grimulkan · 2023-10-04T19:02:50Z

Interestingly flash-attn v1 does build on Windows, as pointed out on Reddit

fenglui · 2023-10-08T14:51:31Z

https://github.com/NVIDIA/cutlass/blob/ff02da266713bd3365aed65c552412e126c040cb/media/docs/build/building_in_windows_with_visual_studio.md?plain=1#L4

danthe3rd mentioned this issue Jul 19, 2023

Flash Attention 2 facebookresearch/xformers#795

Closed

snykral mentioned this issue Jul 31, 2023

Error building V2 on windows with CUDA 11.8 #395

Open

drisspg mentioned this issue Aug 29, 2023

Enable FlashAttentionV2 on Windows pytorch/pytorch#108175

Open

LaurentMazare mentioned this issue Sep 16, 2023

Slow Stable Diffusion huggingface/candle#735

Open

Panchovix referenced this issue in oobabooga/text-generation-webui Sep 18, 2023

Add a warning about ExLlamaV2 without flash-attn

605ec3c

grimulkan mentioned this issue Sep 22, 2023

windows #565

Open

chenmozhijin mentioned this issue Sep 27, 2023

不支持windows hiyouga/LLaMA-Factory#1064

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flashv2] Windows support #345

[Flashv2] Windows support #345

danthe3rd commented Jul 19, 2023 •

edited

Loading

mnicely commented Jul 19, 2023

drisspg commented Aug 8, 2023

tridao commented Aug 8, 2023

Panchovix commented Sep 16, 2023 •

edited

Loading

mnicely commented Sep 18, 2023

Panchovix commented Sep 18, 2023

grimulkan commented Sep 22, 2023

Panchovix commented Sep 22, 2023

BadisG commented Sep 23, 2023

grimulkan commented Oct 4, 2023

fenglui commented Oct 8, 2023

[Flashv2] Windows support #345

[Flashv2] Windows support #345

Comments

danthe3rd commented Jul 19, 2023 • edited Loading

mnicely commented Jul 19, 2023

drisspg commented Aug 8, 2023

tridao commented Aug 8, 2023

Panchovix commented Sep 16, 2023 • edited Loading

mnicely commented Sep 18, 2023

Panchovix commented Sep 18, 2023

grimulkan commented Sep 22, 2023

Panchovix commented Sep 22, 2023

BadisG commented Sep 23, 2023

grimulkan commented Oct 4, 2023

fenglui commented Oct 8, 2023

danthe3rd commented Jul 19, 2023 •

edited

Loading

Panchovix commented Sep 16, 2023 •

edited

Loading