Dao-AILab / flash-attention Public

Notifications You must be signed in to change notification settings
Fork 1.3k
Star 14.3k

Code
Issues 590
Pull requests 46
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: Dao-AILab/flash-attention

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

590 Open 541 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Unable to cast Python instance of type <class 'torch._subclasses.fake_tensor.FakeTensor'> to C++ type

#1351 opened Nov 21, 2024 by zwhe99

How could I use a query to calculate the attention with multiple k-v

#1350 opened Nov 21, 2024 by DongyuXu77

Question of the equation in Flash Attention 2 Paper

#1349 opened Nov 21, 2024 by jeffrey-sunh1

Issue with installing flash attention import flash_attn_2_cuda as flash_attn_cuda

#1348 opened Nov 20, 2024 by hahmad2008

FA3 Failed to initialize the TMA descriptor

#1343 opened Nov 20, 2024 by li-yi-dong

Assistance on implementing Flash Attention 2 for Turing

#1342 opened Nov 17, 2024 by samuelzxu

[Bug]: Perf slump after updating flash-attn 2.7.0 (with torch.compile using)

#1341 opened Nov 16, 2024 by Mnb66

Building a wheel for torch 2.5.0-2.5.1 with Python 3.10 and CUDA 12.4 on Windows has failed.

#1340 opened Nov 16, 2024 by lldacing

where can i download the whl for torch2.5 win10?

#1339 opened Nov 16, 2024 by czcz1024

v2.6.3's flash_attn_varlen_func runs faster than v2.7.0.post2's flash_Attn_varlen_func on H100

#1338 opened Nov 16, 2024 by complexfilter

FileNotFoundError: [Errno 2] No such file or directory: 'ldconfig' for flash_attn.layers.rotary.apply_rotary_emb_qkv_(qkv, cos, sin)

#1336 opened Nov 14, 2024 by albertotono

2.6.3 is faster than 2.7.0 for flash-attn v2 CUDA fwd/bwd

#1335 opened Nov 14, 2024 by ds-kczerski

how to disable flash atten in python?

#1334 opened Nov 14, 2024 by hiyyg

Not possible to script flash_attn_2_cuda.varlen_fwd function

#1332 opened Nov 13, 2024 by ArtyoMKos

Prebuild flash attention wheels for linux!

#1330 opened Nov 12, 2024 by kunibald413

CUDA 12.1 Python 3.10 PyTorch 2.5.1 安装版本

#1327 opened Nov 11, 2024 by wuxi-dixi

Question about flashdecoding with appendKV

#1325 opened Nov 10, 2024 by DD-DuDa

In FA3, which specific Layout_K SW is used for smemO?

#1324 opened Nov 10, 2024 by ziyuhuang123

CUDA 12.6 Performance Issue

#1323 opened Nov 9, 2024 by rchardx

调用flash attn内核的sdpa失败

#1321 opened Nov 7, 2024 by czydfj

when I use layer_norm, it reports a segment error

#1320 opened Nov 7, 2024 by huangjch526

Is try_wait on barrier_Q Similar to barrier_O? Is an Additional Wait Needed?

#1319 opened Nov 6, 2024 by ziyuhuang123

Why is barrier_O Necessary, and What is it Waiting For?

#1318 opened Nov 6, 2024 by ziyuhuang123

In the FlashAttention 3 (FA3) code, where is the barrier_O phase specified as 1 or 0?

#1317 opened Nov 5, 2024 by ziyuhuang123

Why we iteratively arrive at barrier_O??

#1315 opened Nov 5, 2024 by ziyuhuang123

Previous 1 2 3 4 5 … 23 24 Next

Previous Next

ProTip! Exclude everything labeled bug with -label:bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly