-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for Jais models #3183
Conversation
Not ready to be merged yet. Bugs found.Fixes to be done. |
Nice PR! Left a few cosmetic changes + some ideas for how to slightly improve the performance of the models via fusion in the MLP / a fused activation function |
@robertgshaw2-neuralmagic could you please check the status of the PR ? |
@grandiose-pizza Sorry for delay some days. Please merge the latest update, and you can now just read model's scale parameter from config and pass it to |
Hi @esmeetu , I have update the code to adapt to #3233 . Tested with multi and single GPU setting. Works as expected. However, there was a tiny miss in your PR and I have included that fix as well in this push. I am running into an error for gpt2, I think you may have missed updating the new_tokens properly just for gpt2.py. For rest is okay. I have included this fix in this PR. Hope this is Okay: Here is the older bug. vllm/vllm/model_executor/models/gpt2.py Line 245 in 6ebd02b
But the sampler forward function itself accepts two values:
I am getting the following error due to this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that! LGTM.
* upstream/main: [Misc] Bump up transformers to v4.39.0 & Remove StarCoder2Config (vllm-project#3551) [Misc][Log] Add log for tokenizer length not equal to vocabulary size (vllm-project#3500) [🚀 Ready to be merged] Added support for Jais models (vllm-project#3183) Fix 1D query issue from `_prune_hidden_states` (vllm-project#3539) [PREFIX CACHING FOLLOW UP] OrderedDict-based evictor (vllm-project#3431) [BugFix] Hot fix in setup.py for neuron build (vllm-project#3537) Migrate `logits` computation and gather to `model_runner` (vllm-project#3233) [1/n][Chunked Prefill] Refactor input query shapes (vllm-project#3236) [1/n] Triton sampling kernel (vllm-project#3186) [Bugfix] Fix ROCm support in CMakeLists.txt (vllm-project#3534)
Jais models are pretrained and fine-tuned over a curated Arabic and English text/ prompt-response pairs datasets respectively. It is trained from scratch by Core42 in partnership with MBZUAI and Cerebras systems on their Condor Galaxy. The model architecture is based on transformer-based decoder-only (GPT-3) architecture and uses SwiGLU non-linearity. It implements ALiBi position embeddings, enabling the model to extrapolate to long sequence lengths, providing improved context handling and model precision.
The research can be studied here: https://arxiv.org/pdf/2308.16149.pdf
Metrics can be found here: https://huggingface.co/core42/jais-30b-chat-v3
These are SoTA Arabic-English bilingual models. The playground can be accessed at https://arabic-gpt.ai/