-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Issues: Dao-AILab/flash-attention
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Unable to cast Python instance of type <class 'torch._subclasses.fake_tensor.FakeTensor'> to C++ type
#1351
opened Nov 21, 2024 by
zwhe99
How could I use a query to calculate the attention with multiple k-v
#1350
opened Nov 21, 2024 by
DongyuXu77
Issue with installing flash attention
import flash_attn_2_cuda as flash_attn_cuda
#1348
opened Nov 20, 2024 by
hahmad2008
[Bug]: Perf slump after updating flash-attn 2.7.0 (with torch.compile using)
#1341
opened Nov 16, 2024 by
Mnb66
Building a wheel for torch 2.5.0-2.5.1 with Python 3.10 and CUDA 12.4 on Windows has failed.
#1340
opened Nov 16, 2024 by
lldacing
v2.6.3's flash_attn_varlen_func runs faster than v2.7.0.post2's flash_Attn_varlen_func on H100
#1338
opened Nov 16, 2024 by
complexfilter
Is try_wait on barrier_Q Similar to barrier_O? Is an Additional Wait Needed?
#1319
opened Nov 6, 2024 by
ziyuhuang123
In the FlashAttention 3 (FA3) code, where is the barrier_O phase specified as 1 or 0?
#1317
opened Nov 5, 2024 by
ziyuhuang123
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.