Flash Attention 2 doesn't get built/compiles on Windows. #553

Panchovix · 2023-09-18T23:23:53Z

Hi there, impressive work. Tested in on Linux and the VRAM and speeds with higher context is impressive (tested on exllamav2)

I've tried to do the same on Windows for exllamav2, but I have issues when either compiling or building from source.

I tried with:

Torch 2.0.1+cu118 and CUDA 11.8
Torch 2.2+cu121 and CUDA 12.1
Visual Studio 2022

The errors are these, based on if doing python setup.py install from source or doing it via pip.

Compiling from source error

[2/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim160_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_fp16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_00003160_00000000-7_flash_bwd_hdim160_fp16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim160<cutlass::half_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[3/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim160_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_bf16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_00005ccc_00000000-7_flash_bwd_hdim160_bf16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim160<cutlass::bfloat16_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[4/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim192_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_bf16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_000038c0_00000000-7_flash_bwd_hdim192_bf16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim192<cutlass::bfloat16_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[5/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim192_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_fp16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_00002c68_00000000-7_flash_bwd_hdim192_fp16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim192<cutlass::half_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[6/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim128_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_fp16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_000030a8_00000000-7_flash_bwd_hdim128_fp16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim128<cutlass::half_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[7/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim128_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_bf16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_0000556c_00000000-7_flash_bwd_hdim128_bf16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim128<cutlass::bfloat16_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2100, in _run_ninja_build
    subprocess.run(
  File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '6']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "F:\ChatIAs\oobabooga\flash-attention\setup.py", line 287, in <module>
    setup(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
    return run_commands(dist)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
    dist.run_commands()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 968, in run_commands
    self.run_command(cmd)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install.py", line 74, in run
    self.do_egg_install()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install.py", line 123, in do_egg_install
    self.run_command('bdist_egg')
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\bdist_egg.py", line 165, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\bdist_egg.py", line 151, in call_command
    self.run_command(cmdname)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install_lib.py", line 11, in run
    self.build()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\install_lib.py", line 112, in build
    self.run_command('build_ext')
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 84, in run
    _build_ext.run(self)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 346, in run
    self.build_extensions()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 873, in build_extensions
    build_ext.build_extensions(self)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 466, in build_extensions
    self._build_extensions_serial()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 492, in _build_extensions_serial
    self.build_extension(ext)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
    _build_ext.build_extension(self, ext)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 547, in build_extension
    objects = self.compiler.compile(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 845, in win_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2116, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

The text was updated successfully, but these errors were encountered:

Panchovix · 2023-09-18T23:23:59Z

Install from pip error

ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 257, in run
          urllib.request.urlretrieve(wheel_url, wheel_filename)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 241, in urlretrieve
          with contextlib.closing(urlopen(url, data)) as fp:
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 216, in urlopen
          return opener.open(url, data, timeout)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 525, in open
          response = meth(req, response)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 634, in http_response
          response = self.parent.error(
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 563, in error
          return self._call_chain(*args)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 496, in _call_chain
          result = func(*args)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 643, in http_error_default
          raise HTTPError(req.full_url, code, msg, hdrs, fp)
      urllib.error.HTTPError: HTTP Error 404: Not Found

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2100, in _run_ninja_build
          subprocess.run(
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '6']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 277, in <module>
          setup(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\__init__.py", line 87, in setup
          return distutils.core.setup(**attrs)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
          return run_commands(dist)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
          dist.run_commands()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 968, in run_commands
          self.run_command(cmd)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
          super().run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
          cmd_obj.run()
        File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 274, in run
          super().run()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\wheel\bdist_wheel.py", line 343, in run
          self.run_command("build")
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
          self.distribution.run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
          super().run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
          cmd_obj.run()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build.py", line 132, in run
          self.run_command(cmd_name)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
          self.distribution.run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
          super().run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
          cmd_obj.run()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 84, in run
          _build_ext.run(self)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 346, in run
          self.build_extensions()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 873, in build_extensions
          build_ext.build_extensions(self)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 466, in build_extensions
          self._build_extensions_serial()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 492, in _build_extensions_serial
          self.build_extension(ext)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
          _build_ext.build_extension(self, ext)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 547, in build_extension
          objects = self.compiler.compile(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 845, in win_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2116, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for flash-attn
  Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

Is there any additional requisite besides the mentioned, to install flash-attn on Windows?

tridao · 2023-09-19T01:09:35Z

I've no idea since it's only been tested on Linux, and I don't have access to a Windows machine. If you figure out how to build on Windows (or what we need to change to support Windows), please lmk.

Panchovix · 2023-10-10T20:31:57Z

Closing as 5a83425 fixes it.

grimulkan · 2023-10-10T21:12:31Z

@Panchovix are you saying we can now compile flash-attn on Windows somehow? I couldn't with the latest pull, unless I'm missing something.

Panchovix · 2023-10-10T21:22:20Z

@Panchovix are you saying we can now compile flash-attn on Windows somehow? I couldn't with the latest pull, unless I'm missing something.

Yes, now it is possible. Latest pull should work. You do need CUDA 12.x though, since CUDA 11.8 and lower don't support it.

I've uploaded a wheel here https://huggingface.co/Panchovix/flash-attn-2-windows-test-wheel

More discussion here: #595

grimulkan · 2023-10-10T21:25:43Z

Thanks, 11.8 was my error. Woohoo!

rocketpoweryul · 2024-02-25T13:15:31Z

@Panchovix are you saying we can now compile flash-attn on Windows somehow? I couldn't with the latest pull, unless I'm missing something.

Yes, now it is possible. Latest pull should work. You do need CUDA 12.x though, since CUDA 11.8 and lower don't support it.

I've uploaded a wheel here https://huggingface.co/Panchovix/flash-attn-2-windows-test-wheel

More discussion here: #595

The link gives a 404 now

grimulkan · 2024-02-25T19:08:53Z

There are binaries here. I can't build anything beyond 2.4.2 from source myself and can't find Windows binaries beyond that anywhere. 2.4.2 works fine with current packages though.

Adlinga · 2024-04-01T18:02:34Z

With some untraceable magic I've built 2.5.6 on windows 10.
It took ~2.5 hours for compiling.

Cuda 12.4
Torch 2.2.2+cu121
ninja 1.11.1

SavorSauc3 · 2024-04-02T21:37:01Z

For anyone looking to use Flash Attention on Windows, I got it working after some tweaking. You have to make sure that Cuda 12.4 is installed, and PyTorch should be 2.2.2+cu121. I used pip and it took about 2 hours to finish setup. Hope this helps anyone who wants to use flash-attn on Windows. BTW I am using windows 11 pro, mileage may vary on Windows 10.

sadimoodi · 2024-04-09T03:55:52Z

For anyone looking to use Flash Attention on Windows, I got it working after some tweaking. You have to make sure that Cuda 12.4 is installed, and PyTorch should be 2.2.2+cu121. I used pip and it took about 2 hours to finish setup. Hope this helps anyone who wants to use flash-attn on Windows. BTW I am using windows 11 pro, mileage may vary on Windows 10.

have seen significant improvements after using flash attention? how much ?

SavorSauc3 · 2024-04-09T23:33:36Z

I was able to get it working. The problem seems to be that many ml frameworks don’t support flash attention on windows. You would have to do tests for yourself, but it seems like ctransformers does use it. Since I didn’t check the performance before installing flash attention, I couldn’t say what the improvements were.

…

On Mon, Apr 8, 2024 at 11:56 PM sadimoodi ***@***.***> wrote: For anyone looking to use Flash Attention on Windows, I got it working after some tweaking. You have to make sure that Cuda 12.4 is installed, and PyTorch should be 2.2.2+cu121. I used pip and it took about 2 hours to finish setup. Hope this helps anyone who wants to use flash-attn on Windows. BTW I am using windows 11 pro, mileage may vary on Windows 10. have seen significant improvements after using flash attention? how much ? — Reply to this email directly, view it on GitHub <#553 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A42G35XPZ52I7HND4ZCNSODY4NRF7AVCNFSM6AAAAAA45N5VROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBUGEYDIMJXGQ> . You are receiving this because you commented.Message ID: ***@***.***>

grimulkan · 2024-04-18T21:06:22Z

Got it working on Windows 10 as well on Torch 2.2.2 (with Cuda 12.4 installed). Took around 15-20 min to compile on a 64-core threadripper with Ninja, so it does scale well with compute.

dr4gos-pop · 2024-04-22T13:49:51Z

Version 2.5.7 working on my Windows 10, building took around 2h:

pip install flash-attn --no-build-isolation
Collecting flash-attn
Using cached flash_attn-2.5.7.tar.gz (2.5 MB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in (from flash-attn) (2.2.2+cu121)
Requirement already satisfied: einops in (from flash-attn) (0.7.0)
Requirement already satisfied: packaging in (from flash-attn) (24.0)
Requirement already satisfied: ninja in (from flash-attn) (1.11.1.1)
Requirement already satisfied: filelock in (from torch->flash-attn) (3.13.3)
Requirement already satisfied: typing-extensions>=4.8.0 in (from torch->flash-attn) (4.11.0)
Requirement already satisfied: sympy in (from torch->flash-attn) (1.12)
Requirement already satisfied: networkx in (from torch->flash-attn) (2.8.8)
Requirement already satisfied: jinja2 in (from torch->flash-attn) (3.1.3)
Requirement already satisfied: fsspec in (from torch->flash-attn) (2024.3.1)
Requirement already satisfied: MarkupSafe>=2.0 in (from jinja2->torch->flash-attn) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in (from sympy->torch->flash-attn) (1.3.0)
Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py) ... done
Created wheel for flash-attn: filename=flash_attn-2.5.7-cp311-cp311-win_amd64.whl size=117462147
Stored in directory: c:\users\appdata\local\pip\cache\wheels\94\a7\df\cf319d566d2bb53c7f3dd1b15ab2736cabca3e6410c75bd206
Successfully built flash-attn
Installing collected packages: flash-attn
Successfully installed flash-attn-2.5.7

LostRuins · 2024-04-28T08:34:45Z

Any luck getting it to work with cuda 11.8?

d-kleine · 2024-06-13T07:52:22Z

(...) building took around 2h:

A package that needs 2 hours to install? Sorry, but that's a no-go for me.
Any ways to speed up this up in the future? Maybe as an installer instead of a package?

grimulkan · 2024-06-13T18:15:26Z

A package that needs 2 hours to install? Sorry, but that's a no-go for me.
Any ways to speed up this up in the future? Maybe as an installer instead of a package?

Well it doesn't take that long if you have a multi-core processor (it's the compile time). In general you're right, someone should maintain pre-built wheels, and someone usually does, but it's not consistent for Windows builds right now and you have to search GitHub for someone who has uploaded a recent build.

The good news is FA2 is a pretty stable product right now I think, and you can grab an older wheel and it'll probably work just as well, as long as it supports the CUDA version you're using.

Any luck getting it to work with cuda 11.8?

I tried but it would not compile. Might be one of the dependencies (like cutlass?) needs 12.0.

hananbeer · 2024-08-05T04:34:51Z

are there more recent builds for windows? I get the same error.

and for the 2.4.2 binaries I get this error:

ImportError: DLL load failed while importing flash_attn_2_cuda: The specified procedure could not be found."

grimulkan · 2024-08-05T04:59:45Z

https://github.com/bdashore3/flash-attention/releases

hananbeer · 2024-08-05T05:12:52Z

https://github.com/bdashore3/flash-attention/releases

thanks for quick reply. unfortunately the same error persists with these builds.
maybe something in my PATH is missing as I did get strange C++ build tools errors I managed to workaround but perhaps not completely fix... prebuilt is better of course.

I have cuda 12.4 btw and these say cu123... hmm.

grimulkan · 2024-08-05T05:19:04Z

That should be fine, technically. CUDA libs are generally backwards compatible, as long as your torch also has a compatible CUDA build. Does the latest pre-built wheel work? I do get the error you’re getting if I use a newer package with an older flash-attn wheel, or build an older version of flash-attn. Maybe some non-compatible change that was never reported in Windows. But most recent build or wheel of flash-attn removes that error for me.

hananbeer · 2024-08-06T01:02:07Z

I finally found a root cause for the build fail.

https://stackoverflow.com/a/78576792/13305027

I don't understand the VS 2022 version thing because that's what I have installed, but apparently it is related to some minor version it wasn't entirely clear how to downgrade to another 2022 version, so perhaps installing <2022 would suffice.

alternatively upgrade to cuda 12.4, preferably 12.5 it seems.

I am now testing a different approach to fix support without reinstallation.
pip calls setup.py script which calls pytorch cpp_extension.py which builds using ninja which calls nvcc... and using --allow-unsupported-compiler should workaround the issue.

ps. perhaps worth mentioning on WSL it's pretty much hassle free, except there I had an error related to flash-attn, so on that other project I could simply bypass flash-attn by finding and setting use_flash_attn = False in the code. (might be something similar for you)

grimulkan · 2024-08-06T02:09:20Z

That makes sense, the cases when I got that error I was probably linking with CUDA 12.1, and in my recent builds I had switched to 12.5. Also have a very early version of 2022 and have never updated since it was first released.

What was the issue with WSL? It seems to work fine for me.

hananbeer · 2024-08-06T02:16:50Z

I had this error:
facebookresearch/sam2#100

I ended up using the same type of solution they proposed which is to bypass flash attn altogether.

perhaps this is the case for anyone reading this thread, but not so helpful if you actually need flash attn.

I'm not sure what the implications of this would be but that repo seemed to work without it. maybe you have some insights?

the-xentropy · 2024-08-26T19:21:24Z

I suspect that this is caused by version differences and how absurdly easy the import paths get messed up on Windows, and ultimately caused by that with Windows unless you're using Conda you really need to figure out yourself which versions are compatible, and even then you need to know to install things in the right order.

What worked for me, unintuitive things in bold:

Uninstalll pytorch, torchvision, xformers & torchaudio
Uninstall all MSVC C++ build tools
Uninstall all Cuda, Cuda Toolkit, CUDNN, and other Nvidia SDKs (read: type 'nvidia', 'cudnn' and 'cuda' into the add/remove programs feature and remove anything that isn't geforce experience or drivers)
Restart
Install MSVC C++ build tools (I have Visual Studio Community 2022, 17.11.1, the most recent one, and I also added MSVC v143 build tools for v17.9
Install all the CUDA things. I went for Cuda 12.4.1 and CUDNN 9.2.1. Do NOT install this first. The Cuda toolkit HAS to configure the MSVC setup!
Install pytorch (2.4.1, torchaudio 2.4.1 and torchvision 0.19)
Restart (yes, unlike a lot of guides that say you have to, you actually have to. It will not work otherwise. I tried)

TL;DR: Pay super close attention to which versions are installed all over your system, and consider doing a clean re-install of CUDA stuff.

As for easing this going forward, I think adding some sanity checks in the build process to see which versions are installed, if the include paths are sensible, etc, and if they make sense would be a good step. As a 'crash early' mitigation, maybe we could do a quick build of some Cuda hello world before kicking off the main process? As long as the program isn't too trivial I think it's highly likely to catch build misconfigurations.

sunsetcoder · 2024-08-27T08:11:34Z

I followed steps 1-4 (made sure to remove all CUDA / CuDNN from Add/Remove programs - only the Geforce drivers & Geforce experience remained).

Installed the Latest Microsoft Visual C++ Redistributable Version after step 8 to fix "OSError WinError 126, error loading fbgemm.dll or dependencies]" (occured when running "import pytorch")

Installed CUDA 12.4.1

Windows 11: cuDNN 9.2 was installed from the tarball:

Extract the downloaded zip file to a temporary location.
Copy the extracted files to the CUDA toolkit directory:
- Copy bin\cudnn*.dll to C:\Program Files\NVIDIA\CUDNN\v9.2\bin
- Copy include\cudnn*.h to C:\Program Files\NVIDIA\CUDNN\v9.2\include
- Copy lib\x64\cudnn*.lib to C:\Program Files\NVIDIA\CUDNN\v9.2\lib
Set up environment variables:
Open the Start menu and type "Environment Variables"
Click on "Edit the system environment variables"
Click the "Environment Variables" button
Under "System variables", find and edit the "Path" variable
Add the following path: C:\Program Files\NVIDIA\CUDNN\v9.2\bin
Verify the installation:
Open a new command prompt
Run the following command to check if cuDNN is properly installed: where cudnn*.dll
This should display the path to the cuDNN DLL files.

I created a new venv and installed PyTorch 2.4 by modifying Step 7:

The latest PyTorch is 2.4.0 as of 08-27-2024.
I installed it with this command: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Finally:

updated pip (pip3 install --upgrade pip)
installed packaging: pip install packaging and pip install wheel
cloned the repo
ran python setup.py install

It's currently building (with a lot of warnings in the process, such as \flash_bwd_kernel.h(483): warning #177-D: variable "dtanh" was declared but never referenced)

sunsetcoder · 2024-08-27T08:42:21Z

Build completed . Created a .whl file with python setup.py bdist_wheel:

evilalmus · 2024-09-20T18:11:43Z

@sunsetcoder Thank you!, I've been trying to get flash_attn installed for days, these instructions are the first ones that worked.

sunsetcoder · 2024-09-21T02:14:37Z

@evilalmus You're welcome. Make sure to use Python 3.10. 3.12 no bueno

evilalmus · 2024-09-21T08:52:07Z

3.11.9 worked for me.

sidrez · 2024-09-27T15:50:55Z

@sunsetcoder @the-xentropy Thank you for the provided instructions. I tried to install it for several days, but nothing worked. I tried using Docker. In the end, I came across your instructions, followed them, and it worked. It installed on the latest Python 3.12.5. But my graphics card is not suitable for Flash Attention 2 😭😭😭

Nivitus · 2024-10-25T06:47:46Z

First remove any existing flash-attention

pip uninstall flash-attn -y

Install build requirements

pip install ninja packaging

Try with specific compiler settings

FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn==2.3.2 --no-build-isolation

Alternative installation with explicit CUDA path

CUDA_HOME=/usr/local/cuda pip install flash-attn==2.3.2 --no-build-isolation

Panchovix referenced this issue in oobabooga/text-generation-webui Sep 18, 2023

Add a warning about ExLlamaV2 without flash-attn

605ec3c

grimulkan mentioned this issue Sep 22, 2023

windows #565

Open

chenmozhijin mentioned this issue Sep 27, 2023

不支持windows hiyouga/LLaMA-Factory#1064

Closed

Panchovix closed this as completed Oct 10, 2023

1E04 mentioned this issue Jul 3, 2024

is_flash_attn_2_available() returns False Vaibhavs10/insanely-fast-whisper#192

Open

ronghanghu mentioned this issue Oct 2, 2024

Seeking Guidance: Addressing Performance-Related Warning Messages to Optimize Execution Speed facebookresearch/sam2#329

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash Attention 2 doesn't get built/compiles on Windows. #553

Flash Attention 2 doesn't get built/compiles on Windows. #553

Panchovix commented Sep 18, 2023

Panchovix commented Sep 18, 2023

tridao commented Sep 19, 2023

Panchovix commented Oct 10, 2023

grimulkan commented Oct 10, 2023

Panchovix commented Oct 10, 2023 •

edited

Loading

grimulkan commented Oct 10, 2023

rocketpoweryul commented Feb 25, 2024

grimulkan commented Feb 25, 2024

Adlinga commented Apr 1, 2024

SavorSauc3 commented Apr 2, 2024

sadimoodi commented Apr 9, 2024

SavorSauc3 commented Apr 9, 2024 via email

grimulkan commented Apr 18, 2024

dr4gos-pop commented Apr 22, 2024

LostRuins commented Apr 28, 2024

d-kleine commented Jun 13, 2024

grimulkan commented Jun 13, 2024

hananbeer commented Aug 5, 2024

grimulkan commented Aug 5, 2024

hananbeer commented Aug 5, 2024

grimulkan commented Aug 5, 2024

hananbeer commented Aug 6, 2024 •

edited

Loading

grimulkan commented Aug 6, 2024 •

edited

Loading

hananbeer commented Aug 6, 2024

the-xentropy commented Aug 26, 2024 •

edited

Loading

sunsetcoder commented Aug 27, 2024 •

edited

Loading

sunsetcoder commented Aug 27, 2024

evilalmus commented Sep 20, 2024

sunsetcoder commented Sep 21, 2024

evilalmus commented Sep 21, 2024

sidrez commented Sep 27, 2024

Nivitus commented Oct 25, 2024 •

edited

Loading

Flash Attention 2 doesn't get built/compiles on Windows. #553

Flash Attention 2 doesn't get built/compiles on Windows. #553

Comments

Panchovix commented Sep 18, 2023

Panchovix commented Sep 18, 2023

tridao commented Sep 19, 2023

Panchovix commented Oct 10, 2023

grimulkan commented Oct 10, 2023

Panchovix commented Oct 10, 2023 • edited Loading

grimulkan commented Oct 10, 2023

rocketpoweryul commented Feb 25, 2024

grimulkan commented Feb 25, 2024

Adlinga commented Apr 1, 2024

SavorSauc3 commented Apr 2, 2024

sadimoodi commented Apr 9, 2024

SavorSauc3 commented Apr 9, 2024 via email

grimulkan commented Apr 18, 2024

dr4gos-pop commented Apr 22, 2024

LostRuins commented Apr 28, 2024

d-kleine commented Jun 13, 2024

grimulkan commented Jun 13, 2024

hananbeer commented Aug 5, 2024

grimulkan commented Aug 5, 2024

hananbeer commented Aug 5, 2024

grimulkan commented Aug 5, 2024

hananbeer commented Aug 6, 2024 • edited Loading

grimulkan commented Aug 6, 2024 • edited Loading

hananbeer commented Aug 6, 2024

the-xentropy commented Aug 26, 2024 • edited Loading

sunsetcoder commented Aug 27, 2024 • edited Loading

sunsetcoder commented Aug 27, 2024

evilalmus commented Sep 20, 2024

sunsetcoder commented Sep 21, 2024

evilalmus commented Sep 21, 2024

sidrez commented Sep 27, 2024

Nivitus commented Oct 25, 2024 • edited Loading

First remove any existing flash-attention

Install build requirements

Try with specific compiler settings

Alternative installation with explicit CUDA path

Panchovix commented Oct 10, 2023 •

edited

Loading

hananbeer commented Aug 6, 2024 •

edited

Loading

grimulkan commented Aug 6, 2024 •

edited

Loading

the-xentropy commented Aug 26, 2024 •

edited

Loading

sunsetcoder commented Aug 27, 2024 •

edited

Loading

Nivitus commented Oct 25, 2024 •

edited

Loading