Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom mappings for different models #513

Merged
merged 1 commit into from
Oct 31, 2023

Conversation

fedefernandez
Copy link
Contributor

@fedefernandez fedefernandez commented Oct 30, 2023

Model URI Adapter

This PR adds support for different models based on the path and model in the request. With this, you can quickly start a new endpoint and redirect all the traffic for a particular path (chat completion, embeddings, ...) to a specific endpoint instead of OpenAI.

Example 1

Let's say we want to use a custom model for the embeddings.

1. Prepare the embedding endpoint

One of the easiest ways to do it is to use huggingface/text-embeddings-inference.

Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.

One of the tools that TEI provides is a multi-purpose docker image to ramp up a server. Since TEI supports the Bert model, we can use BAAI/llm-embedder. Starting this model locally is as easy as running the following command:

docker run -p 9090:80 -v /tmp/gf-data:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-0.3.0 --model-id BAAI/llm-embedder --revision main
  • /tmp/gf-data is a shared volume with the Docker container to avoid downloading weights every run
  • text-embeddings-inference:cpu-0.3.0 uses the Docker image for running locally in CPU. We have others available
  • BAAI/llm-embedder is the model
  • main is the repo revision. It can be a tag and a commit also.

The Docker image will bump a server with a series of endpoints. One of them is fully compatible with the OpenAI API:

2. Install the Ktor client plugin

The second step is to install the Ktor client plugin to redirect requests to the local embedding endpoint. We want to redirect all the requests to "/v1/embeddings" with the model BAAI/llm-embedder to the recently created endpoint:

To install the plugin and configure the Ktor client plugin, we need to add the following code in the client's initialization:

HttpClient(CIO) {
engine {
requestTimeout = 0 // disabled
}
install(Auth)
install(Logging) { level = LogLevel.INFO }
install(ClientContentNegotiation)
}

install(com.xebia.functional.xef.server.http.client.ModelUriAdapter) {
  addToPath(
    com.xebia.functional.xef.server.http.client.OpenAIPathType.EMBEDDINGS,
    "BAAI/llm-embedder" to "http://localhost:9090/openai"
  )
}

Example 2

Redirecting chat completion requests is quite similar. Let's imagine we have a local proxy running a gpt-3.5-turbo in the following address:

The necessary code for redirecting the requests will be:

install(com.xebia.functional.xef.server.http.client.ModelUriAdapter) {
  addToPath(
    com.xebia.functional.xef.server.http.client.OpenAIPathType.CHAT,
    "gpt-3.5-turbo" to "http://localhost:8080/chat/completions"
  )
}

Notes

Future improvements could use headers as input for the URI/model map. Also, the map is currently set in code, but we could use DB to fill that information.

This works as a reverse proxy without modifying the request/response. The next step will be to have a new component that could adapt requests/responses to APIs that are not OpenAI compatible (like MLflow). The full interaction is represented in the following diagram.

sequenceDiagram
    Client->>ModelUriAdapter: Path + Model
    ModelUriAdapter->>ModelReqResAdapter: Request
    ModelReqResAdapter->>Model: Request
    Model->>ModelReqResAdapter: Response
    ModelReqResAdapter->>Client: Response
Loading

@fedefernandez fedefernandez force-pushed the feature/request-interceptor branch from d6a9f89 to 39cd65b Compare October 31, 2023 10:37
@fedefernandez fedefernandez marked this pull request as ready for review October 31, 2023 11:05
Copy link
Contributor

@raulraja raulraja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌 great work @fedefernandez !

@fedefernandez fedefernandez merged commit 009b8d9 into main Oct 31, 2023
5 checks passed
@fedefernandez fedefernandez deleted the feature/request-interceptor branch October 31, 2023 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants