Allow custom mappings for different models #513

fedefernandez · 2023-10-30T17:06:20Z

Model URI Adapter

This PR adds support for different models based on the path and model in the request. With this, you can quickly start a new endpoint and redirect all the traffic for a particular path (chat completion, embeddings, ...) to a specific endpoint instead of OpenAI.

Example 1

Let's say we want to use a custom model for the embeddings.

1. Prepare the embedding endpoint

One of the easiest ways to do it is to use huggingface/text-embeddings-inference.

Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.

One of the tools that TEI provides is a multi-purpose docker image to ramp up a server. Since TEI supports the Bert model, we can use BAAI/llm-embedder. Starting this model locally is as easy as running the following command:

docker run -p 9090:80 -v /tmp/gf-data:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-0.3.0 --model-id BAAI/llm-embedder --revision main

/tmp/gf-data is a shared volume with the Docker container to avoid downloading weights every run
text-embeddings-inference:cpu-0.3.0 uses the Docker image for running locally in CPU. We have others available
BAAI/llm-embedder is the model
main is the repo revision. It can be a tag and a commit also.

The Docker image will bump a server with a series of endpoints. One of them is fully compatible with the OpenAI API:

https://huggingface.github.io/text-embeddings-inference/#/Text%20Embeddings%20Inference/openai_embed

2. Install the Ktor client plugin

The second step is to install the Ktor client plugin to redirect requests to the local embedding endpoint. We want to redirect all the requests to "/v1/embeddings" with the model BAAI/llm-embedder to the recently created endpoint:

http://localhost:9090/openai

To install the plugin and configure the Ktor client plugin, we need to add the following code in the client's initialization:

xef/server/src/main/kotlin/com/xebia/functional/xef/server/Server.kt

Lines 56 to 63 in 21f0f4a

    
           HttpClient(CIO) { 
        
             engine { 
        
               requestTimeout = 0 // disabled 
        
             } 
        
             install(Auth) 
        
             install(Logging) { level = LogLevel.INFO } 
        
             install(ClientContentNegotiation) 
        
           }

install(com.xebia.functional.xef.server.http.client.ModelUriAdapter) {
  addToPath(
    com.xebia.functional.xef.server.http.client.OpenAIPathType.EMBEDDINGS,
    "BAAI/llm-embedder" to "http://localhost:9090/openai"
  )
}

Example 2

Redirecting chat completion requests is quite similar. Let's imagine we have a local proxy running a gpt-3.5-turbo in the following address:

http://localhost:8080/chat/completions

The necessary code for redirecting the requests will be:

install(com.xebia.functional.xef.server.http.client.ModelUriAdapter) {
  addToPath(
    com.xebia.functional.xef.server.http.client.OpenAIPathType.CHAT,
    "gpt-3.5-turbo" to "http://localhost:8080/chat/completions"
  )
}

Notes

Future improvements could use headers as input for the URI/model map. Also, the map is currently set in code, but we could use DB to fill that information.

This works as a reverse proxy without modifying the request/response. The next step will be to have a new component that could adapt requests/responses to APIs that are not OpenAI compatible (like MLflow). The full interaction is represented in the following diagram.

sequenceDiagram
    Client->>ModelUriAdapter: Path + Model
    ModelUriAdapter->>ModelReqResAdapter: Request
    ModelReqResAdapter->>Model: Request
    Model->>ModelReqResAdapter: Response
    ModelReqResAdapter->>Client: Response

raulraja

🙌 great work @fedefernandez !

Allow custom mappings for different models

39cd65b

fedefernandez force-pushed the feature/request-interceptor branch from d6a9f89 to 39cd65b Compare October 31, 2023 10:37

fedefernandez marked this pull request as ready for review October 31, 2023 11:05

raulraja approved these changes Oct 31, 2023

View reviewed changes

fedefernandez merged commit 009b8d9 into main Oct 31, 2023
5 checks passed

fedefernandez deleted the feature/request-interceptor branch October 31, 2023 16:53

fedefernandez mentioned this pull request Nov 3, 2023

HuggingFace Transformers - Know How #370

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow custom mappings for different models #513

Allow custom mappings for different models #513

fedefernandez commented Oct 30, 2023 •

edited

Loading

raulraja left a comment

	HttpClient(CIO) {
	engine {
	requestTimeout = 0 // disabled
	}
	install(Auth)
	install(Logging) { level = LogLevel.INFO }
	install(ClientContentNegotiation)
	}

Allow custom mappings for different models #513

Allow custom mappings for different models #513

Conversation

fedefernandez commented Oct 30, 2023 • edited Loading

Model URI Adapter

Example 1

1. Prepare the embedding endpoint

2. Install the Ktor client plugin

Example 2

Notes

raulraja left a comment

Choose a reason for hiding this comment

fedefernandez commented Oct 30, 2023 •

edited

Loading