pytorch · msaroufim · Sep 22, 2022 · Aug 22, 2022 · Aug 22, 2022 · Aug 22, 2022
diff --git a/kubernetes/kserve/README.md b/kubernetes/kserve/README.md
@@ -1,55 +1,109 @@
-## End to End Documentation for Torchserve - KServe Model Serving
+# End to End Documentation for Torchserve - KServe Model Serving
 
 The documentation covers the steps to run Torchserve inside the KServe environment for the mnist model.
 
 Currently, KServe supports the Inference API for all the existing models but text to speech synthesizer and it's explain API works for the eager models of MNIST,BERT and text classification only.
 
-### Docker Image Dev Build
-
-```
-DOCKER_BUILDKIT=1 docker build -f Dockerfile.dev -t pytorch/torchserve-kfs:latest-dev .
-```
-
-### Docker Image Building
+## Docker Image Building
 
 - To create a CPU based image
 
-```
+```bash
 ./build_image.sh
 ```
 
 - To create a CPU based image with custom tag
 
-```
+```bash
 ./build_image.sh -t <repository>/<image>:<tag>
 ```
 
 - To create a GPU based image
 
-```
+```bash
 ./build_image.sh -g
 ```
 
 - To create a GPU based image with custom tag
 
-```
+```bash
 ./build_image.sh -g -t <repository>/<image>:<tag>
 ```
 
-### Running Torchserve inference service in KServe cluster
+### Docker Image Dev Build
+
+```bash
+DOCKER_BUILDKIT=1 docker build -f Dockerfile.dev -t pytorch/torchserve-kfs:latest-dev .
+```
+
+## Running Torchserve inference service in KServe cluster
+### Create Kubernetes cluster with eksctl
+
+```yaml
+apiVersion: eksctl.io/v1alpha5
+kind: ClusterConfig
+metadata:
+  name: "kserve-cluster"
+  region: "us-west-2"
+
+vpc:
+  id: "vpc-xxxxxxxxxxxxxxxxx"
+  subnets:
+    private:
+      us-west-2a:
+          id: "subnet-xxxxxxxxxxxxxxxxx"
+      us-west-2c:
+          id: "subnet-xxxxxxxxxxxxxxxxx"
+    public:
+      us-west-2a:
+          id: "subnet-xxxxxxxxxxxxxxxxx"
+      us-west-2c:
+          id: "subnet-xxxxxxxxxxxxxxxxx"
+
+nodeGroups:
+  - name: ng-1
+    minSize: 1
+    maxSize: 4
+    desiredCapacity: 2
+    instancesDistribution:
+      instanceTypes: ["p3.8xlarge"] # At least one instance type should be specified
+      onDemandBaseCapacity: 0
+      onDemandPercentageAboveBaseCapacity: 50
+      spotInstancePools: 5
+```
+
+```bash
+eksctl create cluster -f cluster.yaml
+```
+
+### Install KServe
+
+Run the below command to install kserve in the cluster.
+
+```bash
+curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.8/hack/quick_install.sh" | bash
+```
+
+This installs the latest kserve in the kubernetes cluster.
+
+- create a test namespace kserve-test
+
+```bash
+kubectl create namespace kserve-test
+```
+
+### Steps for running Torchserve inference service in KServe
 
-Please follow the below steps to deploy Torchserve in KServe Cluster
+Here we use the mnist example in Torchserve Repository.
 
 - Step - 1 : Create the .mar file for mnist by invoking the below command
 
-Run the below command inside the serve folder
+Navigate to the cloned serve repo and run
 
 ```bash
 torch-model-archiver --model-name mnist_kf --version 1.0 --model-file examples/image_classifier/mnist/mnist.py --serialized-file examples/image_classifier/mnist/mnist_cnn.pt --handler  examples/image_classifier/mnist/mnist_handler.py
 ```
 
-For BERT and Text Classifier models, to generate a .mar file refer to the "Generate mar file" section of [BERT Readme file](kf_request_json/v2/bert/README.md)
-
 - Step - 2 : Create a config.properties file and place the contents like below:
 
 ```bash
@@ -66,76 +120,107 @@ NUM_WORKERS=1
 number_of_netty_threads=4
 job_queue_size=10
 model_store=/mnt/models/model-store
-model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"<model_name>":{"1.0":{"defaultVersion":true,"marName":"<name of the mar file.>","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}}}}
+model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"mnist_kf":{"1.0":{"defaultVersion":true,"marName":"mnist_kf.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}}}}
 ```
 
 Please note that, the port for inference address should be set at 8085 since KServe by default makes use of 8080 for its inference service.
 
-When we make an Inference Request, in Torchserve it makes use of port 8080, whereas on the KServe side it makes use of port 8085.
+- Step - 3 : Create PV, PVC and PV pods in KServe
 
-Ensure that the KServe envelope is specified in the config file as shown above. The path of the model store should be mentioned as /mnt/models/model-store because KServe mounts the model store from that path.
+Follow the instructions below for creating a PV and copying the config files
 
-The below sequence of steps need to be executed in the KServe cluster.
+- Create PV
 
-- Step - 3 : Create PV, PVC and PV pods in KServe
+Edit volume id in pv.yaml file
 
-Follow the instructions in the link below for creating PV and copying the config files
+```bash
+kubectl apply -f ../reference_yaml/pv-deployments/pv.yaml -n kserve-test
+```
 
-[Steps for creating PVC](https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/torchserve/model-archiver/README.md)
+- Create PVC
 
+```bash
+kubectl apply -f ../reference_yaml/pv-deployments/pvc.yaml -n kserve-test
+```
 
-* Step - 4 : Create the Inference Service
+- Create pod for copying model store files to PV
 
-Refer the following linn to create an inference service
+```bash
+kubectl apply -f ../reference_yaml/pvpod.yaml -n kserve-test
+```
 
-[Creating inference service](https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/torchserve/README.md#create-the-inferenceservice)
+- Step - 4 : Copy the config.properties file and mar file to the PVC using the model-store-pod
 
 ```bash
-DEPLOYMENT_NAME=torch-pred
-SERVICE_HOSTNAME=$(kubectl get inferenceservice ${DEPLOYMENT_NAME}
- -n KServe-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
+# Create directory in PV
+kubectl exec -it model-store-pod -c model-store -n kserve-test -- mkdir /pv/model-store/
+kubectl exec -it model-store-pod -c model-store -n kserve-test -- mkdir /pv/config/
+# Copy files the path
+kubectl cp mnist.mar model-store-pod:/pv/model-store/ -c model-store -n kserve-test
+kubectl cp config.properties model-store-pod:/pv/config/ -c model-store -n kserve-test
+```
+
+Refer link for other [storage options](https://github.com/kserve/kserve/tree/master/docs/samples/storagehttps://github.com/kserve/kserve/tree/master/docs/samples/storage)
+
+- Step - 5 : Create the Inference Service
+
+```bash
+# For v1 protocol
+kubectl apply -f ../reference_yaml/torchserve-deployment/v1/ts_sample.yaml -n kserve-test
+
+# For v2 protocol
+kubectl apply -f ../reference_yaml/torchserve-deployment/v2/ts_sample.yaml -n kserve-test
 ```
 
-* Step - 5 : Generating input files
+Refer link for more [examples](https://github.com/kserve/kserve/tree/master/docs/samples/v1beta1/torchserve)
+
+- Step - 6 : Generating input files
 
 KServe supports different types of inputs (ex: tensor, bytes). Use the following instructions to generate input files based on its type.
 
-1. Preparing input Section - [MNIST input generation](kf_request_json/v2/mnist/README.md) 
-2. Preparing input Section - [Bert input generation](kf_request_json/v2/bert/README.md)
+[MNIST input generation](kf_request_json/v2/mnist/README.md##-Preparing-input) 
+[Bert input generation](kf_request_json/v2/bert/README.md##-Preparing-input)
 
 
-* Step - 6 : Hit the Curl Request to make a prediction as below :
+- Step - 7 : Hit the Curl Request to make a prediction as below :
+
+```bash
+DEPLOYMENT_NAME=torch-pred
+SERVICE_HOSTNAME=$(kubectl get inferenceservice ${DEPLOYMENT_NAME} -n KServe-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
+INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
+INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
+```
 
 For v1 protocol
 
 ```bash
-curl -v -H "Host: ${SERVICE_HOSTNAME}" http://<instance>.<region>amazonaws.com/v1/models/<model-name>:predict -d @<path-to-input-file>
+curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/mnist-kf:predict -d @./kf_request_json/v1/mnist/mnist.json
 ```
 
 For v2 protocol
 
 ```bash
-curl -v -H "Host: ${SERVICE_HOSTNAME}" http://<instance>.<region>amazonaws.com/v2/models/<model-name>/infer -d @<path-to-input-file>
+curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/mnist-kf/infer -d ./kf_request_json/v2/mnist/mnist_v2_bytes.json
 ```
 
-* Step - 7 : Hit the Curl Request to make an explanation as below:
+- Step - 8 : Hit the Curl Request to make an explanation as below:
 
 For v1 protocol
 
 ```bash
-curl -v -H "Host: ${SERVICE_HOSTNAME}" http://<instance>.<region>amazonaws.com/v1/models/<model-name>:explain -d @<path-to-input-file>
+curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/mnist-kf:explain -d ./kf_request_json/v1/mnist/mnist.json
 ```
 
 For v2 protocol
 
 ```bash
-curl -v -H "Host: ${SERVICE_HOSTNAME}" http://<instance>.<region>amazonaws.com/v2/models/<model-name>/explain -d @<path-to-input-file>
+curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/mnist-kf/explain -d ./kf_request_json/v2/mnist/mnist_v2_bytes.json
 ```
 
 Refer the individual Readmes for KServe :
 
-* [BERT](https://github.com/kserve/kserve/tree/master/docs/samples/v1beta1/torchserve/bert#readme)
-* [MNIST](https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/torchserve/README.md)
+- [BERT](https://github.com/kserve/kserve/tree/master/docs/samples/v1beta1/torchserve/bert#readme)
+- [MNIST](https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/torchserve/README.md)
 
 Sample input JSON file for v1 and v2 protocols 
 
@@ -187,20 +272,6 @@ kubectl describe pod <pod-name> -n kserve-test
 kubectl log torch-pred -c kserve-container -n kserve-test
 ```
 
-4. To get the Ingress Host and Port use the following two commands :
-
-```bash
-export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
-export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
-```
-
-5. To get the service host by running the following command:
-
-```bash
-DEPLOYMENT_NAME=_HOSTNAME=$(kubectl get inferenceservice ${DEPLOYMENT_NAME}
- -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
-```
-
 ## Autoscaling
 One of the main serverless inference features is to automatically scale the replicas of an `InferenceService` matching the incoming workload.
 KServe by default enables [Knative Pod Autoscaler](https://knative.dev/docs/serving/autoscaling/) which watches traffic flow and scales up and down