Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Remove
seq_len
argument inself.rotary_emb()
in the flash-attn and xformers patches. Solves #1423.Description
Issue #1423 mentioned that the error
TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len'
occurs when training Llama with flash-attention. I found that it only happens when the version of transformers is greater than 4.38. This error occurs because in transformers version 4.39, theseq_len
argument in theLlamaRotaryEmbedding.forward()
function was deprecated. You can see the change at huggingface/transformers@ffe60fd#diff-06392bad3b9e97be9ade60d4ac46f73b6809388f4d507c2ba1384ab872711c51.In the file
axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py
, line 290 callsLlamaRotaryEmbedding.forward()
with the argumentseq_len=kv_seq_len
, which is no longer supported in the updated version of transformers.Motivation and Context
This change solves issue #1423, allowing flash-attn to be enabled when training with Llama.
How has this been tested?
The changes are small. I re-ran my training script after removing the
seq_len
related arguments and observed the same behavior as when downgrading transformers to version 4.38.