Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will keras_nlp support qwen2 model in future? #1679

Open
pass-lin opened this issue Jun 28, 2024 · 1 comment
Open

Will keras_nlp support qwen2 model in future? #1679

pass-lin opened this issue Jun 28, 2024 · 1 comment
Assignees
Labels
help wanted Extra attention is needed stat:contributions welcome Add this label to feature request issues so they are separated out from bug reporting issues type:feature New feature or request

Comments

@pass-lin
Copy link

The qwen2 model is the sota of the hf leaderboard. And compared with the llama model, there is only one more bias in the qkv dense of mha. Therefore, only a few modifications are required to achieve compatibility with this high-quality model.
Similarly, the Yi model is also a powerful Chinese LLM. Its performance is comparable to that of qwen2, and it fully adopts the llama architecture.
Therefore, in theory keras_nlp compatible with these two models does not take a lot of time. Hope to achieve compatibility with them in the future
https://huggingface.co/Qwen
https://huggingface.co/01-ai

@mattdangerw mattdangerw added help wanted Extra attention is needed stat:contributions welcome Add this label to feature request issues so they are separated out from bug reporting issues labels Jul 18, 2024
@mattdangerw
Copy link
Member

This is open for contributions if anyone would like to take this up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed stat:contributions welcome Add this label to feature request issues so they are separated out from bug reporting issues type:feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants