Skip to content

Latest commit

 

History

History
 
 

deploy

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Kubernetes Deployment

A helm chart for installing OpenVINO Model Server in a Kubernetes cluster is provided. By default, the cluster contains a single instance of the server but the replicas configuration parameter can be set to create a cluster of any size, as described below. This guide assumes you already have a functional Kubernetes cluster and helm installed (see below for instructions on installing helm).

The steps below describe how to setup a model repository, use helm to launch the inference server and then send inference requests to the running server.

Installing Helm

Please refer to Helm installation guide.

Model Repository

If you already have a model repository you may use that with this helm chart. If you don't, you can use any model from OpenVINO Model Zoo.

Model Server requires a repository of models to execute inference requests. For example, you can use a Google Cloud Storage (GCS) bucket:

gsutil mb gs://model-repository

You can download the model from the link provided above and upload it to GCS:

gsutil cp -r 1 gs://model-repository/1

The models repository can be also distributed on the cluster nodes in the local path or it could be stored on the Kubernetes persistent volume.

The supported storage options are described below:

GCS

Bucket permissions can be set with the GOOGLE_APPLICATION_CREDENTIALS environment variable. Please follow the steps below:

  • Generate Google service account JSON file with permissions: Storage Legacy Bucket Reader, Storage Legacy Object Reader, Storage Object Viewer. Name a file for example: gcp-creds.json (you can follow these instructions to create a Service Account and download JSON)

  • Create a Kubernetes secret from this JSON file:

    $ kubectl create secret generic gcpcreds --from-file gcp-creds.json
    
  • When deploying Model Server, provide the model path to GCS bucket and name for the secret created above. Make sure to provide gcp_creds_secret_name when deploying:

helm install ovms-app ovms --set model_name=resnet50-binary-0001,model_path=gs://models-repository,gcp_creds_secret_name=gcpcreds

S3

For S3 you must provide an AWS Access Key ID, the content of that key (AWS Secret Access Key) and the AWS region when deploying: aws_access_key_id, aws_secret_access_key and aws_region (see below)

helm install ovms-app ovms --set model_name=icnet-camvid-ava-0001,model_path=s3://models-repository,aws_access_key_id=<...>,aws_secret_access_key=<...>,aws_region=eu-central-1

In case you would like to use custom S3 service with compatible API (e.g. MinIO), you need to also provide endpoint to that service. Please provide it by supplying s3_compat_api_endpoint:

helm install ovms-app ovms --set model_name=icnet-camvid-ava-0001,model_path=s3://models-repository,aws_access_key_id=<...>,aws_secret_access_key=<...>,s3_compat_api_endpoint=<...>

Azure Storage

Use OVMS with models stored on azure blob storage by providing azure_storage_connection_string parameter. Model path should follow az scheme like below:

helm install ovms-app ovms --set model_name=resnet,model_path=az://bucket/model_path,azure_storage_connection_string="DefaultEndpointsProtocol=https;AccountName=azure_account_name;AccountKey=smp/hashkey==;EndpointSuffix=core.windows.net"

Local Node Storage

Beside the cloud storage, models could be stored locally on the kubernetes nodes filesystem. Use the parameter models_host_path with the local path on the nodes. It will be mounted in the OVMS container as /models folder.

While the models folder is mounted in the OVMS container, the parameter model_path should refer to the path starting with /models/... and point to the folder with the model versions.

Note that the OVMS container starts, by default, with the security context of account ovms with pid 5000 and group 5000. If the mounted models have restricted access permissions, change the security context of the OVMS service or adjust permissions to the models. OVMS requires read permissions on the model files and list permission on the model version folders.

Persistent Volume

It is possible to deploy OVMS using Kubernetes persistent volumes.

That opens a possibility of storing the models for OVMS on all Kubernetes supported filesystems.

In the helm set the parameter models_volume_claim with the name of the PersistentVolumeClaim record with the models. While set, it will be mounted as /models folder inside the OVMS container.

Note that parameter models_volume_claim is mutually exclusive with models_host_path. Only one of them should be set.

Assigning Resource Specs

You can restrict assigned cluster resources to the OVMS container by setting the parameter resources. By default, there are no restrictions but that parameter could be used to reduce the CPU and memory allocation. Below is the snippet example from the values.yaml file:

resources:
  limits:
    cpu: 8.0
    memory: 512Mi

Beside setting the CPU and memory resources, the same parameter can be used to assign AI accelerators like iGPU, or VPU. That assumes using adequate Kubernetes device plugin from Intel Device Plugin for Kubernetes.

resources:
  limits:
    gpu.intel.com/i915: 1

Security Context

OVMS, by default, starts with the security context of ovms account which has pid 5000 and gid 5000. In some cases it can prevent importing models stored on the file system with restricted access. It might require adjusting the security context of OVMS service. It can be changed using a parameter security_context.

An example of the values is presented below:

security_context:
  runAsUser: 5000
  runAsGroup: 5000

The security configuration could be also adjusted further with all options specified in Kubernetes documentation

Service Type

The helm chart creates the Kubernetes service as part of the OVMS deployment. Depending on the cluster infrastructure you can adjust the service type. In the cloud environment you might set LoadBalancer type to expose the service externally. NodePort could expose a static port of the node IP address. ClusterIP would keep the OVMS service internal to the cluster applications.

Deploy OpenVINO Model Server with a Single Model

Deploy Model Server using helm. Please include the required model name and model path. You can also adjust other parameters defined in values.yaml.

helm install ovms-app ovms --set model_name=resnet50-binary-0001,model_path=gs://models-repository

Use kubectl to see status and wait until the Model Server pod is running:

kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
ovms-app-5fd8d6b845-w87jl   1/1     Running   0          27s

By default, Model Server is deployed with 1 instance. If you would like to scale up additional replicas, override the value in values.yaml file or by passing --set flag to helm install:

helm install ovms-app ovms --set model_name=resnet50-binary-0001,model_path=gs://models-repository,replicas=3

Deploy OpenVINO Model Server with Multiple Models Defined in a Configuration File

To serve multiple models you can run Model Server with a configuration file as described here: https://github.com/openvinotoolkit/model_server/blob/main/docs/docker_container.md#configfile

Follow the above documentation to create a configuration file named config.json and fill it with proper information.

To deploy with config file stored in the Kubernetes ConfigMap:

  • create a ConfigMap resource from this file with a chosen name (here ovms-config):
kubectl create configmap ovms-config --from-file config.json
  • deploy Model Server with parameter config_configmap_name (without model_name and model_path):
helm install ovms-app ovms --set config_configmap_name=ovms-config

To deploy with config file stored on the Kubernetes Persistent Volume :

  • Store the config file on node local path set with models_host_path or on the persistent volume claim set with models_claim_name. It will be mounted along with the models in the folder /models.
  • Deploy Model Server with parameter config_path pointing to the location of the config file visible in the OVMS container ie starting from /models/...
helm install ovms-app ovms --set config_path=/models/config.json

Using Model Server

Now that the server is running you can send HTTP or gRPC requests to perform inference. By default, the service is exposed with a LoadBalancer service type. Use the following command to find the external IP for the server:

kubectl get svc
NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                         AGE
ovms-app   LoadBalancer   10.121.14.253   1.2.3.4         8080:30043/TCP,8081:32606/TCP   59m

The server exposes an gRPC endpoint on 8080 port and REST endpoint on 8081 port.

Follow the instructions to create an image classification client that can be used to perform inference with models being exposed by the server. For example:

$ python jpeg_classification.py --grpc_port 8080 --grpc_address 1.2.3.4 --input_name 0 --output_name 1463
Start processing:
	Model name: resnet
	Images list file: input_images.txt
images/airliner.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 25.56 ms; speed 39.13 fps
Detected: 404  Should be: 404
images/arctic-fox.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 20.95 ms; speed 47.72 fps
Detected: 279  Should be: 279
images/bee.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.90 ms; speed 45.67 fps
Detected: 309  Should be: 309
images/golden_retriever.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.84 ms; speed 45.78 fps
Detected: 207  Should be: 207
images/gorilla.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 20.26 ms; speed 49.36 fps
Detected: 366  Should be: 366
images/magnetic_compass.jpeg (1, 3, 224, 224) ; data range: 0.0 : 247.0
Processing time: 20.68 ms; speed 48.36 fps
Detected: 635  Should be: 635
images/peacock.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.57 ms; speed 46.37 fps
Detected: 84  Should be: 84
images/pelican.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 20.53 ms; speed 48.71 fps
Detected: 144  Should be: 144
images/snail.jpeg (1, 3, 224, 224) ; data range: 0.0 : 248.0
Processing time: 22.34 ms; speed 44.75 fps
Detected: 113  Should be: 113
images/zebra.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.27 ms; speed 47.00 fps
Detected: 340  Should be: 340
Overall accuracy= 100.0 %
Average latency= 21.1 ms

Cleanup

Once you've finished using the server you should use helm to uninstall the chart:

$ helm ls
NAME  	  NAMESPACE	REVISION	UPDATED                                 	STATUS  	CHART     	APP VERSION
ovms-app  default  	1       	2020-09-23 14:40:07.292360971 +0200 CEST	deployed	ovms-3.0.0

$ helm uninstall ovms-app
release "ovms-app" uninstalled

Helm Options References

Parameter Description Prerequisites Default
replicas number of k8s pod replicas to deploy 1
image_name change to use different docker image with OVMS openvino/model_server:latest
config_configmap_name Starts OVMS using the config file stored in the ConfigMap Create the ConfigMap including config.json file -
config_path Starts OVMS using the config file mounted from the node local path or the k8s persistent volume Use it together with models_host_path or models_claim_name and place the config file in configured storage path -
grpc_port service port for gRPC interface 8080
grpc_port service port for REST API interface 8081
model_name model name, start OVMS with a single model, excluding with config_configmap_name and config_path parameter -
model_path model path, start OVMS with a single model, excluding with config_configmap_name and config_path parameter -
target_device Target device to run inference operations Non CPU device require the device plugin to be deployed CPU
nireq Size of inference queue set automatically by OpenVINO
plugin_config Device plugin configuration used for performance tuning {"CPU_THROUGHPUT_STREAMS":"CPU_THROUGHPUT_AUTO"}
gcp_creds_secret_name k8s secret resource including GCP credentials, use it with google storage for models Secret should be created with GCP credentials json file -
aws_access_key_id S3 storage access key id, use it with S3 storage for models -
aws_secret_access_key S3 storage secret key, use it with S3 storage for models -
aws_region S3 storage secret key, use it with S3 storage for models -
aws_secret_access_key S3 storage secret key, use it with S3 storage for models -
s3_compat_api_endpoint S3 compatibility api endpoint, use it with Minio storage for models -
azure_storage_connection_string connection string to the Azure Storage authentication account, use it with Azure storage for models -
log_level OVMS log level, one of ERROR,INFO,DEBUG INFO
service_type k8s service type LoadBalancer
resources compute resource limits All CPU and memory on the node
security_context OVMS security context 5000:5000
models_host_path mounts node local path in container as /models folder Path should be created on all nodes and populated with the data -
models_volume_claim mounts k8s persistent volume claim in the container as /models Persistent Volume Claim should be create in the same namespace and populated with the data -
https_proxy proxy name to be used to connect to remote models -