Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flash Attention 2 doesn't get built/compiles on Windows. #553

Closed
Panchovix opened this issue Sep 18, 2023 · 32 comments
Closed

Flash Attention 2 doesn't get built/compiles on Windows. #553

Panchovix opened this issue Sep 18, 2023 · 32 comments

Comments

@Panchovix
Copy link

Hi there, impressive work. Tested in on Linux and the VRAM and speeds with higher context is impressive (tested on exllamav2)

I've tried to do the same on Windows for exllamav2, but I have issues when either compiling or building from source.

I tried with:

Torch 2.0.1+cu118 and CUDA 11.8
Torch 2.2+cu121 and CUDA 12.1
Visual Studio 2022

The errors are these, based on if doing python setup.py install from source or doing it via pip.

Compiling from source error
[2/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim160_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_fp16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_00003160_00000000-7_flash_bwd_hdim160_fp16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim160<cutlass::half_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[3/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim160_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim160_bf16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_00005ccc_00000000-7_flash_bwd_hdim160_bf16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim160<cutlass::bfloat16_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[4/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim192_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_bf16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_000038c0_00000000-7_flash_bwd_hdim192_bf16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim192<cutlass::bfloat16_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[5/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim192_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim192_fp16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_00002c68_00000000-7_flash_bwd_hdim192_fp16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim192<cutlass::half_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[6/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim128_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_fp16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_fp16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_000030a8_00000000-7_flash_bwd_hdim128_fp16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim128<cutlass::half_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
[7/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim128_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_bf16_sm80.cu
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_bf16_sm80.cu
F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion
      printf("GmmaDescriptor: 0x%016 %lli\n", static_cast<long long>(t.desc_));
                                              ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

tmpxft_0000556c_00000000-7_flash_bwd_hdim128_bf16_sm80.cudafe1.cpp
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim128<cutlass::bfloat16_t>(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación
F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_'
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2100, in _run_ninja_build
    subprocess.run(
  File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '6']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "F:\ChatIAs\oobabooga\flash-attention\setup.py", line 287, in <module>
    setup(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
    return run_commands(dist)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
    dist.run_commands()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 968, in run_commands
    self.run_command(cmd)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install.py", line 74, in run
    self.do_egg_install()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install.py", line 123, in do_egg_install
    self.run_command('bdist_egg')
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\bdist_egg.py", line 165, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\bdist_egg.py", line 151, in call_command
    self.run_command(cmdname)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install_lib.py", line 11, in run
    self.build()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\install_lib.py", line 112, in build
    self.run_command('build_ext')
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 84, in run
    _build_ext.run(self)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 346, in run
    self.build_extensions()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 873, in build_extensions
    build_ext.build_extensions(self)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 466, in build_extensions
    self._build_extensions_serial()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 492, in _build_extensions_serial
    self.build_extension(ext)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
    _build_ext.build_extension(self, ext)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 547, in build_extension
    objects = self.compiler.compile(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 845, in win_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2116, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
@Panchovix
Copy link
Author

Install from pip error
ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 257, in run
          urllib.request.urlretrieve(wheel_url, wheel_filename)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 241, in urlretrieve
          with contextlib.closing(urlopen(url, data)) as fp:
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 216, in urlopen
          return opener.open(url, data, timeout)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 525, in open
          response = meth(req, response)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 634, in http_response
          response = self.parent.error(
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 563, in error
          return self._call_chain(*args)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 496, in _call_chain
          result = func(*args)
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 643, in http_error_default
          raise HTTPError(req.full_url, code, msg, hdrs, fp)
      urllib.error.HTTPError: HTTP Error 404: Not Found

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2100, in _run_ninja_build
          subprocess.run(
        File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '6']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 277, in <module>
          setup(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\__init__.py", line 87, in setup
          return distutils.core.setup(**attrs)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
          return run_commands(dist)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
          dist.run_commands()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 968, in run_commands
          self.run_command(cmd)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
          super().run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
          cmd_obj.run()
        File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 274, in run
          super().run()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\wheel\bdist_wheel.py", line 343, in run
          self.run_command("build")
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
          self.distribution.run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
          super().run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
          cmd_obj.run()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build.py", line 132, in run
          self.run_command(cmd_name)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command
          self.distribution.run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command
          super().run_command(command)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
          cmd_obj.run()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 84, in run
          _build_ext.run(self)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 346, in run
          self.build_extensions()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 873, in build_extensions
          build_ext.build_extensions(self)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 466, in build_extensions
          self._build_extensions_serial()
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 492, in _build_extensions_serial
          self.build_extension(ext)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
          _build_ext.build_extension(self, ext)
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 547, in build_extension
          objects = self.compiler.compile(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 845, in win_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2116, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for flash-attn
  Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

Is there any additional requisite besides the mentioned, to install flash-attn on Windows?

Panchovix referenced this issue in oobabooga/text-generation-webui Sep 18, 2023
@tridao
Copy link
Contributor

tridao commented Sep 19, 2023

I've no idea since it's only been tested on Linux, and I don't have access to a Windows machine. If you figure out how to build on Windows (or what we need to change to support Windows), please lmk.

@Panchovix
Copy link
Author

Closing as 5a83425 fixes it.

@grimulkan
Copy link

@Panchovix are you saying we can now compile flash-attn on Windows somehow? I couldn't with the latest pull, unless I'm missing something.

@Panchovix
Copy link
Author

Panchovix commented Oct 10, 2023

@Panchovix are you saying we can now compile flash-attn on Windows somehow? I couldn't with the latest pull, unless I'm missing something.

Yes, now it is possible. Latest pull should work. You do need CUDA 12.x though, since CUDA 11.8 and lower don't support it.

I've uploaded a wheel here https://huggingface.co/Panchovix/flash-attn-2-windows-test-wheel

More discussion here: #595

@grimulkan
Copy link

Thanks, 11.8 was my error. Woohoo!

@rocketpoweryul
Copy link

@Panchovix are you saying we can now compile flash-attn on Windows somehow? I couldn't with the latest pull, unless I'm missing something.

Yes, now it is possible. Latest pull should work. You do need CUDA 12.x though, since CUDA 11.8 and lower don't support it.

I've uploaded a wheel here https://huggingface.co/Panchovix/flash-attn-2-windows-test-wheel

More discussion here: #595

The link gives a 404 now

@grimulkan
Copy link

There are binaries here. I can't build anything beyond 2.4.2 from source myself and can't find Windows binaries beyond that anywhere. 2.4.2 works fine with current packages though.

@Adlinga
Copy link

Adlinga commented Apr 1, 2024

With some untraceable magic I've built 2.5.6 on windows 10.
It took ~2.5 hours for compiling.

Cuda 12.4
Torch 2.2.2+cu121
ninja 1.11.1

@SavorSauc3
Copy link

For anyone looking to use Flash Attention on Windows, I got it working after some tweaking. You have to make sure that Cuda 12.4 is installed, and PyTorch should be 2.2.2+cu121. I used pip and it took about 2 hours to finish setup. Hope this helps anyone who wants to use flash-attn on Windows. BTW I am using windows 11 pro, mileage may vary on Windows 10.

@sadimoodi
Copy link

For anyone looking to use Flash Attention on Windows, I got it working after some tweaking. You have to make sure that Cuda 12.4 is installed, and PyTorch should be 2.2.2+cu121. I used pip and it took about 2 hours to finish setup. Hope this helps anyone who wants to use flash-attn on Windows. BTW I am using windows 11 pro, mileage may vary on Windows 10.

have seen significant improvements after using flash attention? how much ?

@SavorSauc3
Copy link

SavorSauc3 commented Apr 9, 2024 via email

@grimulkan
Copy link

Got it working on Windows 10 as well on Torch 2.2.2 (with Cuda 12.4 installed). Took around 15-20 min to compile on a 64-core threadripper with Ninja, so it does scale well with compute.

@dr4gos-pop
Copy link

Version 2.5.7 working on my Windows 10, building took around 2h:

pip install flash-attn --no-build-isolation
Collecting flash-attn
Using cached flash_attn-2.5.7.tar.gz (2.5 MB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in (from flash-attn) (2.2.2+cu121)
Requirement already satisfied: einops in (from flash-attn) (0.7.0)
Requirement already satisfied: packaging in (from flash-attn) (24.0)
Requirement already satisfied: ninja in (from flash-attn) (1.11.1.1)
Requirement already satisfied: filelock in (from torch->flash-attn) (3.13.3)
Requirement already satisfied: typing-extensions>=4.8.0 in (from torch->flash-attn) (4.11.0)
Requirement already satisfied: sympy in (from torch->flash-attn) (1.12)
Requirement already satisfied: networkx in (from torch->flash-attn) (2.8.8)
Requirement already satisfied: jinja2 in (from torch->flash-attn) (3.1.3)
Requirement already satisfied: fsspec in (from torch->flash-attn) (2024.3.1)
Requirement already satisfied: MarkupSafe>=2.0 in (from jinja2->torch->flash-attn) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in (from sympy->torch->flash-attn) (1.3.0)
Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py) ... done
Created wheel for flash-attn: filename=flash_attn-2.5.7-cp311-cp311-win_amd64.whl size=117462147
Stored in directory: c:\users\appdata\local\pip\cache\wheels\94\a7\df\cf319d566d2bb53c7f3dd1b15ab2736cabca3e6410c75bd206
Successfully built flash-attn
Installing collected packages: flash-attn
Successfully installed flash-attn-2.5.7

@LostRuins
Copy link

Any luck getting it to work with cuda 11.8?

@d-kleine
Copy link

(...) building took around 2h:

A package that needs 2 hours to install? Sorry, but that's a no-go for me.
Any ways to speed up this up in the future? Maybe as an installer instead of a package?

@grimulkan
Copy link

A package that needs 2 hours to install? Sorry, but that's a no-go for me.
Any ways to speed up this up in the future? Maybe as an installer instead of a package?

Well it doesn't take that long if you have a multi-core processor (it's the compile time). In general you're right, someone should maintain pre-built wheels, and someone usually does, but it's not consistent for Windows builds right now and you have to search GitHub for someone who has uploaded a recent build.

The good news is FA2 is a pretty stable product right now I think, and you can grab an older wheel and it'll probably work just as well, as long as it supports the CUDA version you're using.

Any luck getting it to work with cuda 11.8?

I tried but it would not compile. Might be one of the dependencies (like cutlass?) needs 12.0.

@hananbeer
Copy link

are there more recent builds for windows? I get the same error.

and for the 2.4.2 binaries I get this error:

ImportError: DLL load failed while importing flash_attn_2_cuda: The specified procedure could not be found."

@grimulkan
Copy link

@hananbeer
Copy link

https://github.com/bdashore3/flash-attention/releases

thanks for quick reply. unfortunately the same error persists with these builds.
maybe something in my PATH is missing as I did get strange C++ build tools errors I managed to workaround but perhaps not completely fix... prebuilt is better of course.

I have cuda 12.4 btw and these say cu123... hmm.

@grimulkan
Copy link

That should be fine, technically. CUDA libs are generally backwards compatible, as long as your torch also has a compatible CUDA build. Does the latest pre-built wheel work? I do get the error you’re getting if I use a newer package with an older flash-attn wheel, or build an older version of flash-attn. Maybe some non-compatible change that was never reported in Windows. But most recent build or wheel of flash-attn removes that error for me.

@hananbeer
Copy link

hananbeer commented Aug 6, 2024

I finally found a root cause for the build fail.

https://stackoverflow.com/a/78576792/13305027

I don't understand the VS 2022 version thing because that's what I have installed, but apparently it is related to some minor version it wasn't entirely clear how to downgrade to another 2022 version, so perhaps installing <2022 would suffice.

alternatively upgrade to cuda 12.4, preferably 12.5 it seems.

I am now testing a different approach to fix support without reinstallation.
pip calls setup.py script which calls pytorch cpp_extension.py which builds using ninja which calls nvcc... and using --allow-unsupported-compiler should workaround the issue.

ps. perhaps worth mentioning on WSL it's pretty much hassle free, except there I had an error related to flash-attn, so on that other project I could simply bypass flash-attn by finding and setting use_flash_attn = False in the code. (might be something similar for you)

@grimulkan
Copy link

grimulkan commented Aug 6, 2024

That makes sense, the cases when I got that error I was probably linking with CUDA 12.1, and in my recent builds I had switched to 12.5. Also have a very early version of 2022 and have never updated since it was first released.

What was the issue with WSL? It seems to work fine for me.

@hananbeer
Copy link

I had this error:
facebookresearch/sam2#100

I ended up using the same type of solution they proposed which is to bypass flash attn altogether.

perhaps this is the case for anyone reading this thread, but not so helpful if you actually need flash attn.

I'm not sure what the implications of this would be but that repo seemed to work without it. maybe you have some insights?

@the-xentropy
Copy link

the-xentropy commented Aug 26, 2024

I suspect that this is caused by version differences and how absurdly easy the import paths get messed up on Windows, and ultimately caused by that with Windows unless you're using Conda you really need to figure out yourself which versions are compatible, and even then you need to know to install things in the right order.

What worked for me, unintuitive things in bold:

  1. Uninstalll pytorch, torchvision, xformers & torchaudio
  2. Uninstall all MSVC C++ build tools
  3. Uninstall all Cuda, Cuda Toolkit, CUDNN, and other Nvidia SDKs (read: type 'nvidia', 'cudnn' and 'cuda' into the add/remove programs feature and remove anything that isn't geforce experience or drivers)
  4. Restart
  5. Install MSVC C++ build tools (I have Visual Studio Community 2022, 17.11.1, the most recent one, and I also added MSVC v143 build tools for v17.9
  6. Install all the CUDA things. I went for Cuda 12.4.1 and CUDNN 9.2.1. Do NOT install this first. The Cuda toolkit HAS to configure the MSVC setup!
  7. Install pytorch (2.4.1, torchaudio 2.4.1 and torchvision 0.19)
  8. Restart (yes, unlike a lot of guides that say you have to, you actually have to. It will not work otherwise. I tried)

TL;DR: Pay super close attention to which versions are installed all over your system, and consider doing a clean re-install of CUDA stuff.

As for easing this going forward, I think adding some sanity checks in the build process to see which versions are installed, if the include paths are sensible, etc, and if they make sense would be a good step. As a 'crash early' mitigation, maybe we could do a quick build of some Cuda hello world before kicking off the main process? As long as the program isn't too trivial I think it's highly likely to catch build misconfigurations.

@sunsetcoder
Copy link

sunsetcoder commented Aug 27, 2024

I followed steps 1-4 (made sure to remove all CUDA / CuDNN from Add/Remove programs - only the Geforce drivers & Geforce experience remained).

Installed the Latest Microsoft Visual C++ Redistributable Version after step 8 to fix "OSError WinError 126, error loading fbgemm.dll or dependencies]" (occured when running "import pytorch")

Installed CUDA 12.4.1

Windows 11: cuDNN 9.2 was installed from the tarball:

  • Extract the downloaded zip file to a temporary location.

  • Copy the extracted files to the CUDA toolkit directory:

    • Copy bin\cudnn*.dll to C:\Program Files\NVIDIA\CUDNN\v9.2\bin
    • Copy include\cudnn*.h to C:\Program Files\NVIDIA\CUDNN\v9.2\include
    • Copy lib\x64\cudnn*.lib to C:\Program Files\NVIDIA\CUDNN\v9.2\lib
  • Set up environment variables:

  • Open the Start menu and type "Environment Variables"

  • Click on "Edit the system environment variables"

  • Click the "Environment Variables" button

  • Under "System variables", find and edit the "Path" variable

  • Add the following path: C:\Program Files\NVIDIA\CUDNN\v9.2\bin

  • Verify the installation:

  • Open a new command prompt

  • Run the following command to check if cuDNN is properly installed: where cudnn*.dll
    This should display the path to the cuDNN DLL files.

I created a new venv and installed PyTorch 2.4 by modifying Step 7:

  • The latest PyTorch is 2.4.0 as of 08-27-2024.
  • I installed it with this command: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Finally:

  • updated pip (pip3 install --upgrade pip)
  • installed packaging: pip install packaging and pip install wheel
  • cloned the repo
  • ran python setup.py install

It's currently building (with a lot of warnings in the process, such as \flash_bwd_kernel.h(483): warning #177-D: variable "dtanh" was declared but never referenced)

@sunsetcoder
Copy link

Build completed . Created a .whl file with python setup.py bdist_wheel:
image

@evilalmus
Copy link

@sunsetcoder Thank you!, I've been trying to get flash_attn installed for days, these instructions are the first ones that worked.

@sunsetcoder
Copy link

@evilalmus You're welcome. Make sure to use Python 3.10. 3.12 no bueno

@evilalmus
Copy link

3.11.9 worked for me.

@sidrez
Copy link

sidrez commented Sep 27, 2024

@sunsetcoder @the-xentropy Thank you for the provided instructions. I tried to install it for several days, but nothing worked. I tried using Docker. In the end, I came across your instructions, followed them, and it worked. It installed on the latest Python 3.12.5. But my graphics card is not suitable for Flash Attention 2 😭😭😭

@Nivitus
Copy link

Nivitus commented Oct 25, 2024

First remove any existing flash-attention

pip uninstall flash-attn -y

Install build requirements

pip install ninja packaging

Try with specific compiler settings

FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn==2.3.2 --no-build-isolation

Alternative installation with explicit CUDA path

CUDA_HOME=/usr/local/cuda pip install flash-attn==2.3.2 --no-build-isolation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

17 participants