-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vllm support for embedding endpoint #3435
Comments
It's /embeddings, not /v1/embeddings |
@Nyralei - I've tried both, and both seem to have the same behavior. Sending requests to both /embeddings and /v1/embeddings, I get the following response:
For reference, here is the model template:
One thing to note - both /embeddings and /v1/embeddings work exactly as expected when I change the only the backend parameter from vllm to transformers. It also loads the model into memory even in localAI's current state (ie with the vllm backend), but then it fails to return a response. |
that should be quite straightforward to add - I can confirm that currently this is not supported as it is not implemented. |
Is your feature request related to a problem? Please describe.
vLLM has added support for running embedding models like
intfloat/e5-mistral-7b-instruct
, which works with their native OpenAI server. When I send a request to /v1/embeddings with LocalAI started, I get the following error:Describe the solution you'd like
I'd like to be able to run embedding models backed by vLLM through LocalAI as well. Sending the same request to the same endpoint with the vLLM docker container running already works, but I would like to be able to manage this through LocalAI.
Describe alternatives you've considered
While in theory I can run a vLLM instance with this model on a different port, the main purpose of LocalAI to me is to be able to manage the different models and start and stop backend instances based on what is requested. Since there is already support for this in vLLM, my hope is that it isn't too much of a lift to enable it via localAI as well.
The text was updated successfully, but these errors were encountered: