Allow custom mappings for different models #513
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Model URI Adapter
This PR adds support for different models based on the path and model in the request. With this, you can quickly start a new endpoint and redirect all the traffic for a particular path (chat completion, embeddings, ...) to a specific endpoint instead of OpenAI.
Example 1
Let's say we want to use a custom model for the embeddings.
1. Prepare the embedding endpoint
One of the easiest ways to do it is to use huggingface/text-embeddings-inference.
One of the tools that TEI provides is a multi-purpose docker image to ramp up a server. Since TEI supports the Bert model, we can use BAAI/llm-embedder. Starting this model locally is as easy as running the following command:
/tmp/gf-data
is a shared volume with the Docker container to avoid downloading weights every runtext-embeddings-inference:cpu-0.3.0
uses the Docker image for running locally in CPU. We have others availableBAAI/llm-embedder
is the modelmain
is the repo revision. It can be a tag and a commit also.The Docker image will bump a server with a series of endpoints. One of them is fully compatible with the OpenAI API:
2. Install the Ktor client plugin
The second step is to install the Ktor client plugin to redirect requests to the local embedding endpoint. We want to redirect all the requests to
"/v1/embeddings"
with the modelBAAI/llm-embedder
to the recently created endpoint:To install the plugin and configure the Ktor client plugin, we need to add the following code in the client's initialization:
xef/server/src/main/kotlin/com/xebia/functional/xef/server/Server.kt
Lines 56 to 63 in 21f0f4a
Example 2
Redirecting chat completion requests is quite similar. Let's imagine we have a local proxy running a
gpt-3.5-turbo
in the following address:The necessary code for redirecting the requests will be:
Notes
Future improvements could use headers as input for the URI/model map. Also, the map is currently set in code, but we could use DB to fill that information.
This works as a reverse proxy without modifying the request/response. The next step will be to have a new component that could adapt requests/responses to APIs that are not OpenAI compatible (like MLflow). The full interaction is represented in the following diagram.