Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: kserve document improvement #1807

Merged
merged 7 commits into from
Sep 22, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 126 additions & 55 deletions kubernetes/kserve/README.md
Original file line number Diff line number Diff line change
@@ -1,55 +1,109 @@
## End to End Documentation for Torchserve - KServe Model Serving
# End to End Documentation for Torchserve - KServe Model Serving

The documentation covers the steps to run Torchserve inside the KServe environment for the mnist model.

Currently, KServe supports the Inference API for all the existing models but text to speech synthesizer and it's explain API works for the eager models of MNIST,BERT and text classification only.

### Docker Image Dev Build

```
DOCKER_BUILDKIT=1 docker build -f Dockerfile.dev -t pytorch/torchserve-kfs:latest-dev .
```

### Docker Image Building
## Docker Image Building

- To create a CPU based image

```
```bash
./build_image.sh
```

- To create a CPU based image with custom tag

```
```bash
./build_image.sh -t <repository>/<image>:<tag>
```

- To create a GPU based image

```
```bash
./build_image.sh -g
```

- To create a GPU based image with custom tag

```
```bash
./build_image.sh -g -t <repository>/<image>:<tag>
```

### Running Torchserve inference service in KServe cluster
### Docker Image Dev Build

```bash
DOCKER_BUILDKIT=1 docker build -f Dockerfile.dev -t pytorch/torchserve-kfs:latest-dev .
```

## Running Torchserve inference service in KServe cluster
### Create Kubernetes cluster with eksctl
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved

```yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: "kserve-cluster"
region: "us-west-2"

vpc:
id: "vpc-xxxxxxxxxxxxxxxxx"
subnets:
private:
us-west-2a:
id: "subnet-xxxxxxxxxxxxxxxxx"
us-west-2c:
id: "subnet-xxxxxxxxxxxxxxxxx"
public:
us-west-2a:
id: "subnet-xxxxxxxxxxxxxxxxx"
us-west-2c:
id: "subnet-xxxxxxxxxxxxxxxxx"

nodeGroups:
- name: ng-1
minSize: 1
maxSize: 4
desiredCapacity: 2
instancesDistribution:
instanceTypes: ["p3.8xlarge"] # At least one instance type should be specified
onDemandBaseCapacity: 0
onDemandPercentageAboveBaseCapacity: 50
spotInstancePools: 5
```

```bash
eksctl create cluster -f cluster.yaml
```

### Install KServe

Run the below command to install kserve in the cluster.

```bash
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.8/hack/quick_install.sh" | bash
```

This installs the latest kserve in the kubernetes cluster.

- create a test namespace kserve-test

```bash
kubectl create namespace kserve-test
```

### Steps for running Torchserve inference service in KServe

Please follow the below steps to deploy Torchserve in KServe Cluster
Here we use the mnist example in Torchserve Repository.

- Step - 1 : Create the .mar file for mnist by invoking the below command

Run the below command inside the serve folder
Navigate to the cloned serve repo and run

```bash
torch-model-archiver --model-name mnist_kf --version 1.0 --model-file examples/image_classifier/mnist/mnist.py --serialized-file examples/image_classifier/mnist/mnist_cnn.pt --handler examples/image_classifier/mnist/mnist_handler.py
```

For BERT and Text Classifier models, to generate a .mar file refer to the "Generate mar file" section of [BERT Readme file](kf_request_json/v2/bert/README.md)

- Step - 2 : Create a config.properties file and place the contents like below:

```bash
Expand All @@ -66,76 +120,107 @@ NUM_WORKERS=1
number_of_netty_threads=4
job_queue_size=10
model_store=/mnt/models/model-store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"<model_name>":{"1.0":{"defaultVersion":true,"marName":"<name of the mar file.>","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}}}}
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"mnist_kf":{"1.0":{"defaultVersion":true,"marName":"mnist_kf.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}}}}
```

Please note that, the port for inference address should be set at 8085 since KServe by default makes use of 8080 for its inference service.

When we make an Inference Request, in Torchserve it makes use of port 8080, whereas on the KServe side it makes use of port 8085.
- Step - 3 : Create PV, PVC and PV pods in KServe
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved

Ensure that the KServe envelope is specified in the config file as shown above. The path of the model store should be mentioned as /mnt/models/model-store because KServe mounts the model store from that path.
Follow the instructions below for creating a PV and copying the config files

The below sequence of steps need to be executed in the KServe cluster.
- Create PV

- Step - 3 : Create PV, PVC and PV pods in KServe
Edit volume id in pv.yaml file

Follow the instructions in the link below for creating PV and copying the config files
```bash
kubectl apply -f ../reference_yaml/pv-deployments/pv.yaml -n kserve-test
```

[Steps for creating PVC](https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/torchserve/model-archiver/README.md)
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved
- Create PVC

```bash
kubectl apply -f ../reference_yaml/pv-deployments/pvc.yaml -n kserve-test
```

* Step - 4 : Create the Inference Service
- Create pod for copying model store files to PV

Refer the following linn to create an inference service
```bash
kubectl apply -f ../reference_yaml/pvpod.yaml -n kserve-test
```

[Creating inference service](https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/torchserve/README.md#create-the-inferenceservice)
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved
- Step - 4 : Copy the config.properties file and mar file to the PVC using the model-store-pod

```bash
DEPLOYMENT_NAME=torch-pred
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${DEPLOYMENT_NAME}
-n KServe-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
# Create directory in PV
kubectl exec -it model-store-pod -c model-store -n kserve-test -- mkdir /pv/model-store/
kubectl exec -it model-store-pod -c model-store -n kserve-test -- mkdir /pv/config/
# Copy files the path
kubectl cp mnist.mar model-store-pod:/pv/model-store/ -c model-store -n kserve-test
kubectl cp config.properties model-store-pod:/pv/config/ -c model-store -n kserve-test
```

Refer link for other [storage options](https://github.com/kserve/kserve/tree/master/docs/samples/storagehttps://github.com/kserve/kserve/tree/master/docs/samples/storage)

- Step - 5 : Create the Inference Service

```bash
# For v1 protocol
kubectl apply -f ../reference_yaml/torchserve-deployment/v1/ts_sample.yaml -n kserve-test

# For v2 protocol
kubectl apply -f ../reference_yaml/torchserve-deployment/v2/ts_sample.yaml -n kserve-test
```

* Step - 5 : Generating input files
Refer link for more [examples](https://github.com/kserve/kserve/tree/master/docs/samples/v1beta1/torchserve)

- Step - 6 : Generating input files

KServe supports different types of inputs (ex: tensor, bytes). Use the following instructions to generate input files based on its type.

1. Preparing input Section - [MNIST input generation](kf_request_json/v2/mnist/README.md)
2. Preparing input Section - [Bert input generation](kf_request_json/v2/bert/README.md)
[MNIST input generation](kf_request_json/v2/mnist/README.md##-Preparing-input)
[Bert input generation](kf_request_json/v2/bert/README.md##-Preparing-input)


* Step - 6 : Hit the Curl Request to make a prediction as below :
- Step - 7 : Hit the Curl Request to make a prediction as below :

```bash
DEPLOYMENT_NAME=torch-pred
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${DEPLOYMENT_NAME} -n KServe-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
```

For v1 protocol

```bash
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://<instance>.<region>amazonaws.com/v1/models/<model-name>:predict -d @<path-to-input-file>
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/mnist-kf:predict -d @./kf_request_json/v1/mnist/mnist.json
```

For v2 protocol

```bash
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://<instance>.<region>amazonaws.com/v2/models/<model-name>/infer -d @<path-to-input-file>
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/mnist-kf/infer -d ./kf_request_json/v2/mnist/mnist_v2_bytes.json
```

* Step - 7 : Hit the Curl Request to make an explanation as below:
- Step - 8 : Hit the Curl Request to make an explanation as below:

For v1 protocol

```bash
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://<instance>.<region>amazonaws.com/v1/models/<model-name>:explain -d @<path-to-input-file>
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/mnist-kf:explain -d ./kf_request_json/v1/mnist/mnist.json
```

For v2 protocol

```bash
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://<instance>.<region>amazonaws.com/v2/models/<model-name>/explain -d @<path-to-input-file>
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/mnist-kf/explain -d ./kf_request_json/v2/mnist/mnist_v2_bytes.json
```

Refer the individual Readmes for KServe :

* [BERT](https://github.com/kserve/kserve/tree/master/docs/samples/v1beta1/torchserve/bert#readme)
* [MNIST](https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/torchserve/README.md)
- [BERT](https://github.com/kserve/kserve/tree/master/docs/samples/v1beta1/torchserve/bert#readme)
- [MNIST](https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/torchserve/README.md)

Sample input JSON file for v1 and v2 protocols

Expand Down Expand Up @@ -187,20 +272,6 @@ kubectl describe pod <pod-name> -n kserve-test
kubectl log torch-pred -c kserve-container -n kserve-test
```

4. To get the Ingress Host and Port use the following two commands :

```bash
export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
```

5. To get the service host by running the following command:

```bash
DEPLOYMENT_NAME=_HOSTNAME=$(kubectl get inferenceservice ${DEPLOYMENT_NAME}
-n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
```

## Autoscaling
One of the main serverless inference features is to automatically scale the replicas of an `InferenceService` matching the incoming workload.
KServe by default enables [Knative Pod Autoscaler](https://knative.dev/docs/serving/autoscaling/) which watches traffic flow and scales up and down
Expand Down
Loading