-
-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add rope scaling #343
Feat: Add rope scaling #343
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
has everyone else been doing it incorrectly? https://huggingface.co/togethercomputer/LLaMA-2-7B-32K/blob/main/config.json#L14 |
It could be seen here on docs to not set it ourselves: https://github.com/gante/transformers/blob/30409af6e1b2b5efb6d9932b3e3b4ce20cfdb30e/src/transformers/models/llama/configuration_llama.py#L80-L87 At the same time, we also change this if the seq_len > the model's context, so I'm not sure if we should add a validation config check for that to make sure it does not happen. Regarding the together's model, they have their own custom code for modelling_llama, since they require trust_remote_code. It's not exactly the same, so I can't say.. |
This brought up another question in my mind, what about dataset packing? Do we just pass a 4k dataset to a "16k" dataset? If we do not set 16k seq_len, the dataset would be at most 4k? Do we need to add a condition to Not change max_pos_embedding if rope_scaling is true? |
517a59a
to
443f932
Compare
Tested that it runs:
|
* Feat: Add rope scaling * fix: move rope config
Closes #342
Note:
This only applies it to
llama
arch. It should also supportneox
. Should I add specific if condition for this?It was stated on docs to NOT increase the max_position_embeddings ourselves for this. Let's say, if we want 8k, just change the rope factor.
We leave validation to
transformer
.Test