You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
might be needed waiting for #2306 as it likely requires a new llama cpp backend specifically enabled for grpc as it is treated internally as a backend for offloading by llama.cpp (like Metal, CUDA, etc.). I've didn't tried if GRPC builds fallbacks to local builds yet
As ggerganov/llama.cpp#6829 (great job llama.cpp!) is in, should be possible to extend our grpc server to distribute the workload to workers.
From a quick look the upstream implementation looks quite lean as we need to pass params to llama.cpp directly.
Only main point is that we want to propagate this setting from the CLI/env rather then having a config portion in the model
The text was updated successfully, but these errors were encountered: