-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: gRPC
-based backends
#743
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
a68c474
to
dcc3a90
Compare
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This finally makes everything more consistent Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Previously the libs were added by other deps that made the linker add those as well (by chance). Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
been playing with this here. going to merge this and run few rounds of tests and fix things on master if necessary with follow-ups |
@mudler This is a great change -> I don't see a good example of how to use the new grpc-based models. Can you point me in the right direction to run falcon7b via grpc? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR is a multi-fold PR:
falcon
backend. It uses now https://github.com/cmp-nct/ggllm.cppCoverage now is quite good - we just miss testing the backends 1:1. We do test however already: openllama, rwkv and gpt4all
Notes for Reviewers
Moving to
gRPC
increase code complexity but overall minimize maintenance. The hacks needed to compile all in a single-fat binary are now gone, and if a backend crashes doesn't crash the main process (which will attempt to recover the grpc service automatically).Downsides are that the resulting binary is bigger and starting internal services is a bit convoluted.
The gain is notable despite the cons, as now we are free to have also different versions of the same backend with quite some ease. We can, also now, support multiple requests in parallel by allocating more services per model, but this can be done on a following batch
Signed commits