A helm chart for installing OpenVINO Model Server in a Kubernetes cluster is provided. By default, the cluster contains a single instance of the server but the replicas configuration parameter can be set to create a cluster of any size, as described below. This guide assumes you already have a functional Kubernetes cluster and helm installed (see below for instructions on installing helm).
The steps below describe how to setup a model repository, use helm to launch the inference server and then send inference requests to the running server.
Please refer to Helm installation guide.
If you already have a model repository you may use that with this helm chart. If you don't, you can use any model from OpenVINO Model Zoo.
Model Server requires a repository of models to execute inference requests. For example, you can use a Google Cloud Storage (GCS) bucket:
gsutil mb gs://model-repository
You can download the model from the link provided above and upload it to GCS:
gsutil cp -r 1 gs://model-repository/1
The models repository can be also distributed on the cluster nodes in the local path or it could be stored on the Kubernetes persistent volume.
The supported storage options are described below:
Bucket permissions can be set with the GOOGLE_APPLICATION_CREDENTIALS environment variable. Please follow the steps below:
-
Generate Google service account JSON file with permissions: Storage Legacy Bucket Reader, Storage Legacy Object Reader, Storage Object Viewer. Name a file for example: gcp-creds.json (you can follow these instructions to create a Service Account and download JSON)
-
Create a Kubernetes secret from this JSON file:
$ kubectl create secret generic gcpcreds --from-file gcp-creds.json
-
When deploying Model Server, provide the model path to GCS bucket and name for the secret created above. Make sure to provide
gcp_creds_secret_name
when deploying:
helm install ovms-app ovms --set model_name=resnet50-binary-0001,model_path=gs://models-repository,gcp_creds_secret_name=gcpcreds
For S3 you must provide an AWS Access Key ID, the content of that key (AWS Secret Access Key) and the AWS region when deploying: aws_access_key_id
, aws_secret_access_key
and aws_region
(see below)
helm install ovms-app ovms --set model_name=icnet-camvid-ava-0001,model_path=s3://models-repository,aws_access_key_id=<...>,aws_secret_access_key=<...>,aws_region=eu-central-1
In case you would like to use custom S3 service with compatible API (e.g. MinIO), you need to also provide endpoint
to that service. Please provide it by supplying s3_compat_api_endpoint
:
helm install ovms-app ovms --set model_name=icnet-camvid-ava-0001,model_path=s3://models-repository,aws_access_key_id=<...>,aws_secret_access_key=<...>,s3_compat_api_endpoint=<...>
Use OVMS with models stored on azure blob storage by providing azure_storage_connection_string
parameter. Model path should follow az
scheme like below:
helm install ovms-app ovms --set model_name=resnet,model_path=az://bucket/model_path,azure_storage_connection_string="DefaultEndpointsProtocol=https;AccountName=azure_account_name;AccountKey=smp/hashkey==;EndpointSuffix=core.windows.net"
Beside the cloud storage, models could be stored locally on the kubernetes nodes filesystem.
Use the parameter models_host_path
with the local path on the nodes. It will be mounted in the OVMS container as /models
folder.
While the models folder is mounted in the OVMS container, the parameter model_path
should refer to the path starting with /models/... and point to the folder with the model versions.
Note that the OVMS container starts, by default, with the security context of account ovms
with pid 5000 and group 5000.
If the mounted models have restricted access permissions, change the security context of the OVMS service or adjust permissions to the models. OVMS requires read permissions on the model files and
list permission on the model version folders.
It is possible to deploy OVMS using Kubernetes persistent volumes.
That opens a possibility of storing the models for OVMS on all Kubernetes supported filesystems.
In the helm set the parameter models_volume_claim
with the name of the PersistentVolumeClaim
record with the models. While set, it will be mounted as /models
folder inside the OVMS container.
Note that parameter models_volume_claim
is mutually exclusive with models_host_path
. Only one of them should be set.
You can restrict assigned cluster resources to the OVMS container by setting the parameter resources
.
By default, there are no restrictions but that parameter could be used to reduce the CPU and memory allocation. Below is the snippet example from the values.yaml file:
resources:
limits:
cpu: 8.0
memory: 512Mi
Beside setting the CPU and memory resources, the same parameter can be used to assign AI accelerators like iGPU, or VPU. That assumes using adequate Kubernetes device plugin from Intel Device Plugin for Kubernetes.
resources:
limits:
gpu.intel.com/i915: 1
OVMS, by default, starts with the security context of ovms
account which has pid 5000 and gid 5000. In some cases it can prevent importing models
stored on the file system with restricted access.
It might require adjusting the security context of OVMS service. It can be changed using a parameter security_context
.
An example of the values is presented below:
security_context:
runAsUser: 5000
runAsGroup: 5000
The security configuration could be also adjusted further with all options specified in Kubernetes documentation
The helm chart creates the Kubernetes service
as part of the OVMS deployment. Depending on the cluster infrastructure you can adjust
the service type.
In the cloud environment you might set LoadBalancer
type to expose the service externally. NodePort
could expose a static port
of the node IP address. ClusterIP
would keep the OVMS service internal to the cluster applications.
Deploy Model Server using helm. Please include the required model name and model path. You can also adjust other parameters defined in values.yaml.
helm install ovms-app ovms --set model_name=resnet50-binary-0001,model_path=gs://models-repository
Use kubectl to see status and wait until the Model Server pod is running:
kubectl get pods
NAME READY STATUS RESTARTS AGE
ovms-app-5fd8d6b845-w87jl 1/1 Running 0 27s
By default, Model Server is deployed with 1 instance. If you would like to scale up additional replicas, override the value in values.yaml file or by passing --set flag to helm install:
helm install ovms-app ovms --set model_name=resnet50-binary-0001,model_path=gs://models-repository,replicas=3
To serve multiple models you can run Model Server with a configuration file as described here: https://github.com/openvinotoolkit/model_server/blob/main/docs/docker_container.md#configfile
Follow the above documentation to create a configuration file named config.json and fill it with proper information.
To deploy with config file stored in the Kubernetes ConfigMap:
- create a ConfigMap resource from this file with a chosen name (here ovms-config):
kubectl create configmap ovms-config --from-file config.json
- deploy Model Server with parameter
config_configmap_name
(withoutmodel_name
andmodel_path
):
helm install ovms-app ovms --set config_configmap_name=ovms-config
To deploy with config file stored on the Kubernetes Persistent Volume :
- Store the config file on node local path set with
models_host_path
or on the persistent volume claim set withmodels_claim_name
. It will be mounted along with the models in the folder/models
. - Deploy Model Server with parameter
config_path
pointing to the location of the config file visible in the OVMS container ie starting from/models/...
helm install ovms-app ovms --set config_path=/models/config.json
Now that the server is running you can send HTTP or gRPC requests to perform inference. By default, the service is exposed with a LoadBalancer service type. Use the following command to find the external IP for the server:
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ovms-app LoadBalancer 10.121.14.253 1.2.3.4 8080:30043/TCP,8081:32606/TCP 59m
The server exposes an gRPC endpoint on 8080 port and REST endpoint on 8081 port.
Follow the instructions to create an image classification client that can be used to perform inference with models being exposed by the server. For example:
$ python jpeg_classification.py --grpc_port 8080 --grpc_address 1.2.3.4 --input_name 0 --output_name 1463
Start processing:
Model name: resnet
Images list file: input_images.txt
images/airliner.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 25.56 ms; speed 39.13 fps
Detected: 404 Should be: 404
images/arctic-fox.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 20.95 ms; speed 47.72 fps
Detected: 279 Should be: 279
images/bee.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.90 ms; speed 45.67 fps
Detected: 309 Should be: 309
images/golden_retriever.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.84 ms; speed 45.78 fps
Detected: 207 Should be: 207
images/gorilla.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 20.26 ms; speed 49.36 fps
Detected: 366 Should be: 366
images/magnetic_compass.jpeg (1, 3, 224, 224) ; data range: 0.0 : 247.0
Processing time: 20.68 ms; speed 48.36 fps
Detected: 635 Should be: 635
images/peacock.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.57 ms; speed 46.37 fps
Detected: 84 Should be: 84
images/pelican.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 20.53 ms; speed 48.71 fps
Detected: 144 Should be: 144
images/snail.jpeg (1, 3, 224, 224) ; data range: 0.0 : 248.0
Processing time: 22.34 ms; speed 44.75 fps
Detected: 113 Should be: 113
images/zebra.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.27 ms; speed 47.00 fps
Detected: 340 Should be: 340
Overall accuracy= 100.0 %
Average latency= 21.1 ms
Once you've finished using the server you should use helm to uninstall the chart:
$ helm ls
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
ovms-app default 1 2020-09-23 14:40:07.292360971 +0200 CEST deployed ovms-3.0.0
$ helm uninstall ovms-app
release "ovms-app" uninstalled
Parameter | Description | Prerequisites | Default |
---|---|---|---|
replicas | number of k8s pod replicas to deploy | 1 | |
image_name | change to use different docker image with OVMS | openvino/model_server:latest | |
config_configmap_name | Starts OVMS using the config file stored in the ConfigMap | Create the ConfigMap including config.json file | - |
config_path | Starts OVMS using the config file mounted from the node local path or the k8s persistent volume | Use it together with models_host_path or models_claim_name and place the config file in configured storage path | - |
grpc_port | service port for gRPC interface | 8080 | |
grpc_port | service port for REST API interface | 8081 | |
model_name | model name, start OVMS with a single model, excluding with config_configmap_name and config_path parameter | - | |
model_path | model path, start OVMS with a single model, excluding with config_configmap_name and config_path parameter | - | |
target_device | Target device to run inference operations | Non CPU device require the device plugin to be deployed | CPU |
nireq | Size of inference queue | set automatically by OpenVINO | |
plugin_config | Device plugin configuration used for performance tuning | {"CPU_THROUGHPUT_STREAMS":"CPU_THROUGHPUT_AUTO"} | |
gcp_creds_secret_name | k8s secret resource including GCP credentials, use it with google storage for models | Secret should be created with GCP credentials json file | - |
aws_access_key_id | S3 storage access key id, use it with S3 storage for models | - | |
aws_secret_access_key | S3 storage secret key, use it with S3 storage for models | - | |
aws_region | S3 storage secret key, use it with S3 storage for models | - | |
aws_secret_access_key | S3 storage secret key, use it with S3 storage for models | - | |
s3_compat_api_endpoint | S3 compatibility api endpoint, use it with Minio storage for models | - | |
azure_storage_connection_string | connection string to the Azure Storage authentication account, use it with Azure storage for models | - | |
log_level | OVMS log level, one of ERROR,INFO,DEBUG | INFO | |
service_type | k8s service type | LoadBalancer | |
resources | compute resource limits | All CPU and memory on the node | |
security_context | OVMS security context | 5000:5000 | |
models_host_path | mounts node local path in container as /models folder | Path should be created on all nodes and populated with the data | - |
models_volume_claim | mounts k8s persistent volume claim in the container as /models | Persistent Volume Claim should be create in the same namespace and populated with the data | - |
https_proxy | proxy name to be used to connect to remote models | - |