Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vicuna Models checkpoints transfer script #1657

Merged
merged 8 commits into from
Jun 7, 2024

Conversation

sineeli
Copy link
Collaborator

@sineeli sineeli commented May 28, 2024

Successfully converted checkpoints from vicuna(torch) to Keras3 compatible, please let me know if any refactoring needed.

image

Thanks

@sineeli
Copy link
Collaborator Author

sineeli commented May 29, 2024

cc: @mattdangerw

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A couple comments...

Next up, you could try uploading these to your individual user on Kaggle, and making a PR that updates our presets file here -> https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/src/models/llama/llama_presets.py

That would give us all a way to test the vicuna models end to end, then we can copy them to the Keras org on Kaggle when they look good.

Thanks!

print("\n-> Saved the tokenizer")

# === Upload the preset ===
uri = f"kaggle://keras/vicuna/keras/{preset}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do this like the phi3 script

from keras_nlp import upload_preset # noqa: E402

That will still allow people to run this who do not have access to the keras kaggle org.

from keras_nlp.models import LlamaCausalLMPreprocessor
from keras_nlp.models import LlamaTokenizer

PRESET_MAP = {"vicuna_1.5_7b_en": "lmsys/vicuna-7b-v1.5"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the weight conversion all the same as llama 2? If so could we consider consolidating the conversion scripts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the weights are same llam2 architecture, we can merge with existing script. I will try that. Thanks!

@sineeli
Copy link
Collaborator Author

sineeli commented Jun 5, 2024

@mattdangerw

When we run on cpu and use .numpy() at the end of the weights it causes an error: Bfloat16 scalar not surpported but when we see phi3 script there is no such usage and at global level the backend set to torch and he is loading the weights directly.

LLam2 all weights are in torch so and we can use the same approach as phi3 weights convertion script.

phi3 seems more fault tolerant

When tested on CPU from bfloat16 to bfloat16:

image

But when used float16(default of hugging face weights) to float32(keras model) we will not face this mismatch.

Thanks

@mattdangerw
Copy link
Member

When we run on cpu and use .numpy() at the end of the weights it causes an error: Bfloat16 scalar not surpported but when we see phi3 script there is no such usage and at global level the backend set to torch and he is loading the weights directly.

@sineeli I think keras.ops.convert_to_numpy(x) would gracefully handle bfloat16, maybe try that?

With what precision are the original pytorch checkpoints stored on disk? If they are at float16, we could just do the same (and store at float16 on disk). The disk format does not mean we need to load at that format.

Anyway, as soon as you push with the comments addressed above we can merge this PR. And keep working on the actual checkpoints we ship.

@mattdangerw
Copy link
Member

@sineeli can you make your kaggle model public? I'll pull in the script but leave the new preset off for now, we can do that on a follow up PR.

@mattdangerw mattdangerw merged commit 50e0414 into keras-team:master Jun 7, 2024
6 checks passed
@sineeli
Copy link
Collaborator Author

sineeli commented Jun 8, 2024

@sineeli can you make your kaggle model public? I'll pull in the script but leave the new preset off for now, we can do that on a follow up PR.

Sure, waiting for page update. Thank!

https://www.kaggle.com/models/sineeli/vicuna/keras/vicuna_1.5_7b_en

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants