-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Quick GELU #254
Implement Quick GELU #254
Conversation
you let some formatting happen. please only commit code changes. |
Oh sorry, missed that. I'll fix it. |
This is ready for review now. |
Hi @ggerganov Can ı have your attention here please? clip.cpp is almost ready, and I'm only crafting more examples and then I can announce it. I can also add a link in the examples section in readme if you thing it deserves this. p.s.: The next step will be using clip.cpp and llama.cpp to infer with LLaVA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to ggml_gelu_quick
, GELU_QUICK
, etc, in all instances
From what I read, this is just an approximation of GELU
that is supposed to be faster but inaccurate. Why not use the original ggml_gelu()
? It is already doing table lookup so there is no difference in performance + it is more accurate
Also, keep in mind that in the future you can use ggml_map_unary_f32()
to implement this kind of 1D mapping functions in your project without having to wait for ggml
to provide them
Unfortunately that theoretically small divergence between GELU and Quick GELU lead to large differences at the end, I suppose it accumulates through 12 layers. So I couldn't get good results until implementing Quick GELU. |
This causes long prompts to parse very slowly.
Closes #253
This is needed for CLIP-like models and I'm implementing clip.cpp here. It will also be a base for upcoming multimodal models that uses CLIP as an image embedder.