-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance expectations #4
Comments
It is expected, since WebAssembly SIMD only support the equivalent to AVX instruction, not AVX2. This should be the biggest impact to performance atm. Another issue is that we're using emscripten's non-native exception handler which maintains support with older browsers, but come with a small performance cost. We may move to native exception handler in the future. Edit: seems like most mainstream versions of browsers already support native wasm exception (see here), so it's safe to enable it. The support will be added in the next build of wllama. |
v1.6.0 is now using native exception handler via |
Hey @chadkirby, out of curiosity, have you tried on latest version with native exception handler? |
I did. IIRC, I saw a modest performance improvement, but wasm speed was still roughly 3x slower than native. |
One important consideration is that certain browsers, such as Brave, may alter the value of As a result, it is possible that the browser was utilizing only 2 threads, leading to slow inference. Using 8 threads has resulted in satisfactory performance for the Phi-3 model: minisearch-phi-3-wllama.mp4 |
First, thanks for putting this project together!
I modified
examples/basic/index.html
to use a more capable model:https://huggingface.co/lmstudio-ai/gemma-2b-it-GGUF/resolve/main/gemma-2b-it-q4_k_m.gguf
, which is 1.5gb.Using LM Studio on my laptop (with GPU Acceleration disabled), I get roughly 25 tokens per second from
gemma-2b-it-q4_k_m.gguf
.Running
examples/basic/index.html
in Chrome 124 on my laptop, I get roughly 6-7 tokens per second fromgemma-2b-it-q4_k_m.gguf
. (Similar performance in Edge 123.)Generally, the wasm bindings seem roughly 3-4x slower than native. Is that more or less expected? Are there any
wllama
knobs I can twiddle to improve performance?The text was updated successfully, but these errors were encountered: