Feat: Add rope scaling #343

NanoCode012 · 2023-08-05T18:58:26Z

Closes #342

Note:

This only applies it to llama arch. It should also support neox. Should I add specific if condition for this?
It was stated on docs to NOT increase the max_position_embeddings ourselves for this. Let's say, if we want 8k, just change the rope factor.
We leave validation to transformer.
Test

winglian

lgtm

winglian · 2023-08-06T02:40:42Z

It was stated on docs to NOT increase the max_position_embeddings ourselves for this. Let's say, if we want 8k, just change the rope factor.

has everyone else been doing it incorrectly? https://huggingface.co/togethercomputer/LLaMA-2-7B-32K/blob/main/config.json#L14

NanoCode012 · 2023-08-06T04:14:25Z

It was stated on docs to NOT increase the max_position_embeddings ourselves for this. Let's say, if we want 8k, just change the rope factor.

has everyone else been doing it incorrectly? https://huggingface.co/togethercomputer/LLaMA-2-7B-32K/blob/main/config.json#L14

It could be seen here on docs to not set it ourselves: https://github.com/gante/transformers/blob/30409af6e1b2b5efb6d9932b3e3b4ce20cfdb30e/src/transformers/models/llama/configuration_llama.py#L80-L87

At the same time, we also change this if the seq_len > the model's context, so I'm not sure if we should add a validation config check for that to make sure it does not happen.

Regarding the together's model, they have their own custom code for modelling_llama, since they require trust_remote_code. It's not exactly the same, so I can't say..

NanoCode012 · 2023-08-06T04:19:53Z

This brought up another question in my mind, what about dataset packing? Do we just pass a 4k dataset to a "16k" dataset? If we do not set 16k seq_len, the dataset would be at most 4k? Do we need to add a condition to Not change max_pos_embedding if rope_scaling is true?

NanoCode012 · 2023-08-11T17:49:48Z

Tested that it runs:

with rope scaling
with rope scaling , type: burp . Error as expected
without rope scaling

* Feat: Add rope scaling * fix: move rope config

winglian approved these changes Aug 5, 2023

View reviewed changes

NanoCode012 mentioned this pull request Aug 6, 2023

[Feature] Add config for RoPE scaling #342

Closed

Feat: Add rope scaling

443f932

NanoCode012 force-pushed the feat/rope_scaling branch from 517a59a to 443f932 Compare August 11, 2023 17:21

fix: move rope config

3261d1e

NanoCode012 marked this pull request as ready for review August 11, 2023 17:49

NanoCode012 merged commit b521206 into axolotl-ai-cloud:main Aug 12, 2023
3 checks passed

NanoCode012 deleted the feat/rope_scaling branch August 12, 2023 15:50

mkeoliya pushed a commit to mkeoliya/axolotl that referenced this pull request Dec 15, 2023

Feat: Add rope scaling (axolotl-ai-cloud#343)

b500e0c

* Feat: Add rope scaling * fix: move rope config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Add rope scaling #343

Feat: Add rope scaling #343

NanoCode012 commented Aug 5, 2023 •

edited

Loading

winglian left a comment

winglian commented Aug 6, 2023

NanoCode012 commented Aug 6, 2023 •

edited

Loading

NanoCode012 commented Aug 6, 2023

NanoCode012 commented Aug 11, 2023

Feat: Add rope scaling #343

Feat: Add rope scaling #343

Conversation

NanoCode012 commented Aug 5, 2023 • edited Loading

winglian left a comment

Choose a reason for hiding this comment

winglian commented Aug 6, 2023

NanoCode012 commented Aug 6, 2023 • edited Loading

NanoCode012 commented Aug 6, 2023

NanoCode012 commented Aug 11, 2023

NanoCode012 commented Aug 5, 2023 •

edited

Loading

NanoCode012 commented Aug 6, 2023 •

edited

Loading