Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qwen2-audio不支持flash-attn? RuntimeError: cu_seqlens_q must have shape (batch_size + 1) #51

Open
kindaQ opened this issue Aug 27, 2024 · 3 comments

Comments

@kindaQ
Copy link

kindaQ commented Aug 27, 2024

我在尝试使用flash-attn做训练的时候,报错 RuntimeError: cu_seqlens_q must have shape (batch_size + 1)
经过排查,transformers/modeling_qwen2_audio.py Qwen2AudioEncoderLayer中的attention_mask格式为 (batch, 1, tgt_len, src_len),
而transformers/modeling_flash_attention_utils.py中的attention_mask格式为(batch_size, seq_len) 导致在计算seq_lens的时候维度不对,本来需要batch_size+1长度的mask,但是现在得到的是(batch_size,1,tgt_len+1)

@0ohadeso0
Copy link

+1

@SixGoodX
Copy link

我在尝试使用flash-attn做训练的时候,报错 RuntimeError: cu_seqlens_q must have shape (batch_size + 1) 经过排查,transformers/modeling_qwen2_audio.py Qwen2AudioEncoderLayer中的attention_mask格式为 (batch, 1, tgt_len, src_len), 而transformers/modeling_flash_attention_utils.py中的attention_mask格式为(batch_size, seq_len) 导致在计算seq_lens的时候维度不对,本来需要batch_size+1长度的mask,但是现在得到的是(batch_size,1,tgt_len+1)

你好,请问你解决这个问题了吗?

@Lollipop
Copy link

没太理解为什么qwen2audio要输入特定的audio attention mask,看whisper的源码,这个mask其实没有用

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@kindaQ @0ohadeso0 @Lollipop @SixGoodX and others