Remove seq_len arg in rotary_emb #1443

BMPixel · 2024-03-26T12:09:52Z

Remove seq_len argument in self.rotary_emb() in the flash-attn and xformers patches. Solves #1423.

Description

Issue #1423 mentioned that the error TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len' occurs when training Llama with flash-attention. I found that it only happens when the version of transformers is greater than 4.38. This error occurs because in transformers version 4.39, the seq_len argument in the LlamaRotaryEmbedding.forward() function was deprecated. You can see the change at huggingface/transformers@ffe60fd#diff-06392bad3b9e97be9ade60d4ac46f73b6809388f4d507c2ba1384ab872711c51.

In the file axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py, line 290 calls LlamaRotaryEmbedding.forward() with the argument seq_len=kv_seq_len, which is no longer supported in the updated version of transformers.

Motivation and Context

This change solves issue #1423, allowing flash-attn to be enabled when training with Llama.

How has this been tested?

The changes are small. I re-ran my training script after removing the seq_len related arguments and observed the same behavior as when downgrading transformers to version 4.38.

remove seq_len in llama rotary_emb

9923f33

winglian approved these changes Mar 26, 2024

View reviewed changes

chore: lint

0a35298

winglian merged commit e07347b into axolotl-ai-cloud:main Mar 26, 2024
6 checks passed

NanoCode012 mentioned this pull request Mar 27, 2024

TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len' #1423

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove seq_len arg in rotary_emb #1443

Remove seq_len arg in rotary_emb #1443

BMPixel commented Mar 26, 2024

Remove seq_len arg in rotary_emb #1443

Remove seq_len arg in rotary_emb #1443

Conversation

BMPixel commented Mar 26, 2024

Description

Motivation and Context

How has this been tested?