Skip to content

Commit

Permalink
[Bugfix] Flash attention arches not getting set properly (vllm-projec…
Browse files Browse the repository at this point in the history
…t#9062)

Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
  • Loading branch information
LucasWilkinson authored and sumitd2 committed Nov 14, 2024
1 parent a633194 commit d2662e7
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -482,6 +482,17 @@ if (NOT VLLM_TARGET_DEVICE STREQUAL "cuda")
return()
endif ()

# vLLM flash attention requires VLLM_GPU_ARCHES to contain the set of target
# arches in the CMake syntax (75-real, 89-virtual, etc), since we clear the
# arches in the CUDA case (and instead set the gencodes on a per file basis)
# we need to manually set VLLM_GPU_ARCHES here.
if(VLLM_GPU_LANG STREQUAL "CUDA")
foreach(_ARCH ${CUDA_ARCHS})
string(REPLACE "." "" _ARCH "${_ARCH}")
list(APPEND VLLM_GPU_ARCHES "${_ARCH}-real")
endforeach()
endif()

#
# Build vLLM flash attention from source
#
Expand Down

0 comments on commit d2662e7

Please sign in to comment.