-
Notifications
You must be signed in to change notification settings - Fork 199
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update all examples yaml files of GMC in GenAIExample (#436)
* Update all examples yaml files of GMC in GenAIExample. Signed-off-by: zhlsunshine <huailong.zhang@intel.com>
- Loading branch information
1 parent
8ad7f36
commit 290a74f
Showing
16 changed files
with
313 additions
and
9 deletions.
There are no files selected for viewing
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
apiVersion: gmc.opea.io/v1alpha3 | ||
kind: GMConnector | ||
metadata: | ||
labels: | ||
app.kubernetes.io/name: gmconnector | ||
app.kubernetes.io/managed-by: kustomize | ||
gmc/platform: gaudi | ||
name: switch | ||
namespace: switch | ||
spec: | ||
routerConfig: | ||
name: router | ||
serviceName: router-service | ||
nodes: | ||
root: | ||
routerType: Sequence | ||
steps: | ||
- name: Embedding | ||
nodeName: node1 | ||
- name: Reranking | ||
data: $response | ||
internalService: | ||
serviceName: reranking-svc | ||
config: | ||
endpoint: /v1/reranking | ||
TEI_RERANKING_ENDPOINT: tei-reranking-svc | ||
- name: TeiReranking | ||
internalService: | ||
serviceName: tei-reranking-svc | ||
config: | ||
endpoint: /rerank | ||
isDownstreamService: true | ||
- name: Llm | ||
data: $response | ||
nodeName: node2 | ||
node1: | ||
routerType: Switch | ||
steps: | ||
- name: Embedding | ||
condition: embedding-model-id==large | ||
internalService: | ||
serviceName: embedding-svc-large | ||
config: | ||
endpoint: /v1/embeddings | ||
TEI_EMBEDDING_ENDPOINT: tei-embedding-gaudi-svc-bge15 | ||
- name: Embedding | ||
condition: embedding-model-id==small | ||
internalService: | ||
serviceName: embedding-svc-small | ||
config: | ||
endpoint: /v1/embeddings | ||
TEI_EMBEDDING_ENDPOINT: tei-embedding-gaudi-svc-bge-small | ||
- name: TeiEmbeddingGaudi | ||
internalService: | ||
serviceName: tei-embedding-gaudi-svc-bge15 | ||
config: | ||
MODEL_ID: BAAI/bge-base-en-v1.5 | ||
isDownstreamService: true | ||
- name: TeiEmbeddingGaudi | ||
internalService: | ||
serviceName: tei-embedding-gaudi-svc-bge-small | ||
config: | ||
MODEL_ID: BAAI/bge-base-en-v1.5 | ||
isDownstreamService: true | ||
- name: Retriever | ||
condition: embedding-model-id==large | ||
data: $response | ||
internalService: | ||
serviceName: retriever-svc-large | ||
config: | ||
endpoint: /v1/retrieval | ||
REDIS_URL: redis-vector-db-large | ||
TEI_EMBEDDING_ENDPOINT: tei-embedding-gaudi-svc-bge15 | ||
- name: Retriever | ||
condition: embedding-model-id==small | ||
data: $response | ||
internalService: | ||
serviceName: retriever-svc-small | ||
config: | ||
endpoint: /v1/retrieval | ||
REDIS_URL: redis-vector-db-small | ||
TEI_EMBEDDING_ENDPOINT: tei-embedding-gaudi-svc-bge-small | ||
- name: VectorDB | ||
internalService: | ||
serviceName: redis-vector-db-large | ||
isDownstreamService: true | ||
- name: VectorDB | ||
internalService: | ||
serviceName: redis-vector-db-small | ||
isDownstreamService: true | ||
node2: | ||
routerType: Switch | ||
steps: | ||
- name: Llm | ||
condition: model-id==intel | ||
internalService: | ||
serviceName: llm-svc-intel | ||
config: | ||
endpoint: /v1/chat/completions | ||
TGI_LLM_ENDPOINT: tgi-gaudi-service-intel | ||
- name: Llm | ||
condition: model-id==llama | ||
internalService: | ||
serviceName: llm-svc-llama | ||
config: | ||
endpoint: /v1/chat/completions | ||
TGI_LLM_ENDPOINT: tgi-gaudi-service-llama | ||
- name: TgiGaudi | ||
internalService: | ||
serviceName: tgi-gaudi-service-intel | ||
config: | ||
endpoint: /generate | ||
MODEL_ID: Intel/neural-chat-7b-v3-3 | ||
isDownstreamService: true | ||
- name: TgiGaudi | ||
internalService: | ||
serviceName: tgi-gaudi-service-llama | ||
config: | ||
endpoint: /generate | ||
MODEL_ID: openlm-research/open_llama_3b | ||
isDownstreamService: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
apiVersion: gmc.opea.io/v1alpha3 | ||
kind: GMConnector | ||
metadata: | ||
labels: | ||
app.kubernetes.io/name: gmconnector | ||
app.kubernetes.io/managed-by: kustomize | ||
gmc/platform: xeon | ||
name: switch | ||
namespace: switch | ||
spec: | ||
routerConfig: | ||
name: router | ||
serviceName: router-service | ||
nodes: | ||
root: | ||
routerType: Sequence | ||
steps: | ||
- name: Embedding | ||
nodeName: node1 | ||
- name: Reranking | ||
data: $response | ||
internalService: | ||
serviceName: reranking-svc | ||
config: | ||
endpoint: /v1/reranking | ||
TEI_RERANKING_ENDPOINT: tei-reranking-svc | ||
- name: TeiReranking | ||
internalService: | ||
serviceName: tei-reranking-svc | ||
config: | ||
endpoint: /rerank | ||
isDownstreamService: true | ||
- name: Llm | ||
data: $response | ||
nodeName: node2 | ||
node1: | ||
routerType: Switch | ||
steps: | ||
- name: Embedding | ||
condition: embedding-model-id==large | ||
internalService: | ||
serviceName: embedding-svc-large | ||
config: | ||
endpoint: /v1/embeddings | ||
TEI_EMBEDDING_ENDPOINT: tei-embedding-svc-bge15 | ||
- name: Embedding | ||
condition: embedding-model-id==small | ||
internalService: | ||
serviceName: embedding-svc-small | ||
config: | ||
endpoint: /v1/embeddings | ||
TEI_EMBEDDING_ENDPOINT: tei-embedding-svc-bge-small | ||
- name: TeiEmbedding | ||
internalService: | ||
serviceName: tei-embedding-svc-bge15 | ||
config: | ||
MODEL_ID: BAAI/bge-base-en-v1.5 | ||
isDownstreamService: true | ||
- name: TeiEmbedding | ||
internalService: | ||
serviceName: tei-embedding-svc-bge-small | ||
config: | ||
MODEL_ID: BAAI/bge-base-en-v1.5 | ||
isDownstreamService: true | ||
- name: Retriever | ||
condition: embedding-model-id==large | ||
data: $response | ||
internalService: | ||
serviceName: retriever-svc-large | ||
config: | ||
endpoint: /v1/retrieval | ||
REDIS_URL: redis-vector-db-large | ||
TEI_EMBEDDING_ENDPOINT: tei-embedding-svc-bge15 | ||
- name: Retriever | ||
condition: embedding-model-id==small | ||
data: $response | ||
internalService: | ||
serviceName: retriever-svc-small | ||
config: | ||
endpoint: /v1/retrieval | ||
REDIS_URL: redis-vector-db-small | ||
TEI_EMBEDDING_ENDPOINT: tei-embedding-svc-bge-small | ||
- name: VectorDB | ||
internalService: | ||
serviceName: redis-vector-db-large | ||
isDownstreamService: true | ||
- name: VectorDB | ||
internalService: | ||
serviceName: redis-vector-db-small | ||
isDownstreamService: true | ||
node2: | ||
routerType: Switch | ||
steps: | ||
- name: Llm | ||
condition: model-id==intel | ||
internalService: | ||
serviceName: llm-svc-intel | ||
config: | ||
endpoint: /v1/chat/completions | ||
TGI_LLM_ENDPOINT: tgi-service-intel | ||
- name: Llm | ||
condition: model-id==llama | ||
internalService: | ||
serviceName: llm-svc-llama | ||
config: | ||
endpoint: /v1/chat/completions | ||
TGI_LLM_ENDPOINT: tgi-service-llama | ||
- name: Tgi | ||
internalService: | ||
serviceName: tgi-service-intel | ||
config: | ||
endpoint: /generate | ||
MODEL_ID: Intel/neural-chat-7b-v3-3 | ||
isDownstreamService: true | ||
- name: Tgi | ||
internalService: | ||
serviceName: tgi-service-llama | ||
config: | ||
endpoint: /generate | ||
MODEL_ID: bigscience/bloom-560m | ||
isDownstreamService: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
<h1 align="center" id="title">Deploy CodeGen in a Kubernetes Cluster</h1> | ||
|
||
This document outlines the deployment process for a Code Generation (CodeGen) application that utilizes the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice components on Intel Xeon servers and Gaudi machines. | ||
|
||
Please install GMC in your Kubernetes cluster, if you have not already done so, by following the steps in Section "Getting Started" at [GMC Install](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector#readme). We will soon publish images to Docker Hub, at which point no builds will be required, further simplifying install. | ||
|
||
If you have only Intel Xeon machines you could use the codegen_xeon.yaml file or if you have a Gaudi cluster you could use codegen_gaudi.yaml | ||
In the below example we illustrate on Xeon. | ||
|
||
## Deploy the RAG application | ||
|
||
1. Create the desired namespace if it does not already exist and deploy the application | ||
```bash | ||
export APP_NAMESPACE=CT | ||
kubectl create ns $APP_NAMESPACE | ||
sed -i "s|namespace: codegen|namespace: $APP_NAMESPACE|g" ./codegen_xeon.yaml | ||
kubectl apply -f ./codegen_xeon.yaml | ||
``` | ||
|
||
2. Check if the application is up and ready | ||
```bash | ||
kubectl get pods -n $APP_NAMESPACE | ||
``` | ||
|
||
3. Deploy a client pod for testing | ||
```bash | ||
kubectl create deployment client-test -n $APP_NAMESPACE --image=python:3.8.13 -- sleep infinity | ||
``` | ||
|
||
4. Check that client pod is ready | ||
```bash | ||
kubectl get pods -n $APP_NAMESPACE | ||
``` | ||
|
||
5. Send request to application | ||
```bash | ||
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath={.items..metadata.name}) | ||
export accessUrl=$(kubectl get gmc -n $APP_NAMESPACE -o jsonpath="{.items[?(@.metadata.name=='codegen')].status.accessUrl}") | ||
kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -X POST -d '{"query": "def print_hello_world():"}' -H 'Content-Type: application/json' > $LOG_PATH/gmc_codegen.log | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.