We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我在尝试使用flash-attn做训练的时候,报错 RuntimeError: cu_seqlens_q must have shape (batch_size + 1) 经过排查,transformers/modeling_qwen2_audio.py Qwen2AudioEncoderLayer中的attention_mask格式为 (batch, 1, tgt_len, src_len), 而transformers/modeling_flash_attention_utils.py中的attention_mask格式为(batch_size, seq_len) 导致在计算seq_lens的时候维度不对,本来需要batch_size+1长度的mask,但是现在得到的是(batch_size,1,tgt_len+1)
The text was updated successfully, but these errors were encountered:
+1
Sorry, something went wrong.
你好,请问你解决这个问题了吗?
没太理解为什么qwen2audio要输入特定的audio attention mask,看whisper的源码,这个mask其实没有用
No branches or pull requests
我在尝试使用flash-attn做训练的时候,报错 RuntimeError: cu_seqlens_q must have shape (batch_size + 1)
经过排查,transformers/modeling_qwen2_audio.py Qwen2AudioEncoderLayer中的attention_mask格式为 (batch, 1, tgt_len, src_len),
而transformers/modeling_flash_attention_utils.py中的attention_mask格式为(batch_size, seq_len) 导致在计算seq_lens的时候维度不对,本来需要batch_size+1长度的mask,但是现在得到的是(batch_size,1,tgt_len+1)
The text was updated successfully, but these errors were encountered: