Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to support no-archive model archives with kserve #2839

Merged
merged 2 commits into from
Dec 8, 2023

Conversation

agunapal
Copy link
Collaborator

@agunapal agunapal commented Dec 8, 2023

Description

This PR enables kserve to work with model archiver folder along with .mar file

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test with folder
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ git diff
diff --git a/kubernetes/kserve/kserve_wrapper/TorchserveModel.py b/kubernetes/kserve/kserve_wrapper/TorchserveModel.py
index 147d0a6a..528be615 100644
--- a/kubernetes/kserve/kserve_wrapper/TorchserveModel.py
+++ b/kubernetes/kserve/kserve_wrapper/TorchserveModel.py
@@ -1,5 +1,6 @@
 """ The torchserve side inference end-points request are handled to
     return a KServe side response """
+from asyncio.log import logger
 import logging
 import pathlib
 from enum import Enum
@@ -78,6 +79,7 @@ class TorchserveModel(Model):
         logging.info("Predict URL set to %s", self.predictor_host)
         logging.info("Explain URL set to %s", self.explainer_host)
         logging.info("Protocol version is %s", self.protocol)
+        logging.info("Model directory is %s", self.model_dir)
 
     def grpc_client(self):
         if self._grpc_client_stub is None:
@@ -144,8 +146,8 @@ class TorchserveModel(Model):
         and sets ready flag to true.
         """
         model_path = pathlib.Path(Storage.download(self.model_dir))
-        paths = list(pathlib.Path(model_path).glob("*.mar"))
-        existing_paths = [path for path in paths if path.exists()]
+        paths = list(pathlib.Path(model_path).glob("*"))
+        existing_paths = [path for path in paths if path.is_dir() or path.suffixes == ['.mar']]
         if len(existing_paths) == 0:
             raise ModelMissingError(model_path)
         self.ready = True
diff --git a/kubernetes/kserve/tests/configs/mnist_v1_cpu.yaml b/kubernetes/kserve/tests/configs/mnist_v1_cpu.yaml
index 8c6b0442..acace750 100644
--- a/kubernetes/kserve/tests/configs/mnist_v1_cpu.yaml
+++ b/kubernetes/kserve/tests/configs/mnist_v1_cpu.yaml
@@ -5,5 +5,6 @@ metadata:
 spec:
   predictor:
     pytorch:
-      storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
-      image: pytorch/torchserve-kfs-nightly:latest-cpu
+      #storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
+      storageUri: gs://mnist-v1/v1
+      image: agunapal/torchserve-kfs:latest-cpu
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ 
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ 
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ 
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ 
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ kubectl apply -f tests/configs/mnist_v1_cpu.yaml 
inferenceservice.serving.kserve.io/torchserve created
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ kubectl get pods
NAME                                                     READY   STATUS     RESTARTS   AGE
torchserve-predictor-00001-deployment-6d6dbc9c86-ncbjd   0/2     Init:0/1   0          7s
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ kubectl get pods
NAME                                                     READY   STATUS    RESTARTS   AGE
torchserve-predictor-00001-deployment-6d6dbc9c86-ncbjd   2/2     Running   0          96s
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ kubectl logs torchserve-predictor-00001-deployment-6d6dbc9c86-ncbjd
Defaulted container "kserve-container" out of: kserve-container, queue-proxy, storage-initializer (init)
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2023-12-08T01:34:57,050 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2023-12-08T01:34:57,052 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-12-08T01:34:57,162 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2023-12-08T01:34:57,351 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.9.0
TS Home: /home/venv/lib/python3.9/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 494 M
Python executable: /home/venv/bin/python
Config file: /mnt/models/config/config.properties
Inference address: http://0.0.0.0:8085
Management address: http://0.0.0.0:8085
Metrics address: http://0.0.0.0:8082
Model Store: /mnt/models/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 4
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: true
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
Workflow Store: /mnt/models/model-store
Model config: N/A
2023-12-08T01:34:57,357 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2023-12-08T01:34:57,371 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring models from snapshot {"name":"startup.cfg","modelCount":1,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":10,"responseTimeout":120}}}}
2023-12-08T01:34:57,377 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot startup.cfg
2023-12-08T01:34:57,377 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot startup.cfg validated successfully
2023-12-08T01:34:57,383 [INFO ] main org.pytorch.serve.archive.model.ModelArchive - createTempDir /home/model-server/tmp/models/7ca3502cc83848f18b135f8ced120aa2
2023-12-08T01:34:57,383 [INFO ] main org.pytorch.serve.archive.model.ModelArchive - createSymbolicDir /home/model-server/tmp/models/7ca3502cc83848f18b135f8ced120aa2/mnist
2023-12-08T01:34:57,445 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model mnist
2023-12-08T01:34:57,445 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2023-12-08T01:34:57,445 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2023-12-08T01:34:57,445 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model mnist loaded.
2023-12-08T01:34:57,445 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: mnist, count: 1
2023-12-08T01:34:57,453 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-12-08T01:34:57,453 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-12-08T01:34:57,653 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8085
2023-12-08T01:34:57,653 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-12-08T01:34:57,654 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
Model server started.
2023-12-08T01:34:58,446 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2023-12-08T01:34:58,565 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:33.3|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6d6dbc9c86-ncbjd,timestamp:1701999298
2023-12-08T01:34:58,566 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:24.301513671875|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6d6dbc9c86-ncbjd,timestamp:1701999298
2023-12-08T01:34:58,566 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:169.4889373779297|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6d6dbc9c86-ncbjd,timestamp:1701999298
2023-12-08T01:34:58,566 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:87.5|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6d6dbc9c86-ncbjd,timestamp:1701999298
2023-12-08T01:34:58,566 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:58539.15625|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6d6dbc9c86-ncbjd,timestamp:1701999298
2023-12-08T01:34:58,567 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:4369.5859375|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6d6dbc9c86-ncbjd,timestamp:1701999298
2023-12-08T01:34:58,567 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:8.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6d6dbc9c86-ncbjd,timestamp:1701999298
INFO:root:Wrapper : Model names ['mnist'], inference address http://0.0.0.0:8085, management address http://0.0.0.0:8085, grpc_inference_address, 0.0.0.0:7070, model store /mnt/models/model-store
INFO:root:Predict URL set to 0.0.0.0:8085
INFO:root:Explain URL set to 0.0.0.0:8085
INFO:root:Protocol version is v1
INFO:root:Model directory is /mnt/models/model-store
INFO:root:Copying contents of /mnt/models/model-store to local
INFO:root:TSModelRepo is initialized
INFO:kserve:Registering model: mnist
INFO:kserve:Setting max asyncio worker threads as 5
INFO:kserve:Starting uvicorn with 1 workers
2023-12-08 01:34:59.081 uvicorn.error INFO:     Started server process [10]
2023-12-08 01:34:59.081 uvicorn.error INFO:     Waiting for application startup.
2023-12-08 01:34:59.084 10 kserve INFO [start():62] Starting gRPC server on [::]:8081
2023-12-08 01:34:59.084 uvicorn.error INFO:     Application startup complete.
2023-12-08 01:34:59.085 uvicorn.error INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
2023-12-08T01:35:00,581 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9000, pid=57
2023-12-08T01:35:00,582 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2023-12-08T01:35:00,629 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-12-08T01:35:00,630 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - [PID]57
2023-12-08T01:35:00,630 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Torch worker started.
2023-12-08T01:35:00,630 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2023-12-08T01:35:00,630 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change null -> WORKER_STARTED
2023-12-08T01:35:00,634 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2023-12-08T01:35:00,647 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2023-12-08T01:35:00,648 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1701999300648
2023-12-08T01:35:00,663 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1701999300663
2023-12-08T01:35:00,694 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - model_name: mnist, batchSize: 1
2023-12-08T01:35:01,431 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - generated new fontManager
2023-12-08T01:35:01,650 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-12-08T01:35:01,651 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2023-12-08T01:35:01,682 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - '/home/model-server/tmp/models/7ca3502cc83848f18b135f8ced120aa2/mnist/index_to_name.json' is missing. Inference output will not include class name.
2023-12-08T01:35:01,745 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1082
2023-12-08T01:35:01,746 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2023-12-08T01:35:01,746 [INFO ] W-9000-mnist_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:4296.0|#WorkerName:W-9000-mnist_1.0,Level:Host|#hostname:torchserve-predictor-00001-deployment-6d6dbc9c86-ncbjd,timestamp:1701999301
2023-12-08T01:35:01,746 [INFO ] W-9000-mnist_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:16.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6d6dbc9c86-ncbjd,timestamp:1701999301
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ minikube delete
🔥  Deleting "minikube" in docker ...
🔥  Deleting container "minikube" ...
🔥  Removing /home/ubuntu/.minikube/machines/minikube ...
💀  Removed all traces of the "minikube" cluster.

  • Test with .mar file
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ git diff
diff --git a/kubernetes/kserve/kserve_wrapper/TorchserveModel.py b/kubernetes/kserve/kserve_wrapper/TorchserveModel.py
index 147d0a6a..528be615 100644
--- a/kubernetes/kserve/kserve_wrapper/TorchserveModel.py
+++ b/kubernetes/kserve/kserve_wrapper/TorchserveModel.py
@@ -1,5 +1,6 @@
 """ The torchserve side inference end-points request are handled to
     return a KServe side response """
+from asyncio.log import logger
 import logging
 import pathlib
 from enum import Enum
@@ -78,6 +79,7 @@ class TorchserveModel(Model):
         logging.info("Predict URL set to %s", self.predictor_host)
         logging.info("Explain URL set to %s", self.explainer_host)
         logging.info("Protocol version is %s", self.protocol)
+        logging.info("Model directory is %s", self.model_dir)
 
     def grpc_client(self):
         if self._grpc_client_stub is None:
@@ -144,8 +146,8 @@ class TorchserveModel(Model):
         and sets ready flag to true.
         """
         model_path = pathlib.Path(Storage.download(self.model_dir))
-        paths = list(pathlib.Path(model_path).glob("*.mar"))
-        existing_paths = [path for path in paths if path.exists()]
+        paths = list(pathlib.Path(model_path).glob("*"))
+        existing_paths = [path for path in paths if path.is_dir() or path.suffixes == ['.mar']]
         if len(existing_paths) == 0:
             raise ModelMissingError(model_path)
         self.ready = True
diff --git a/kubernetes/kserve/tests/configs/mnist_v1_cpu.yaml b/kubernetes/kserve/tests/configs/mnist_v1_cpu.yaml
index 8c6b0442..0899339f 100644
--- a/kubernetes/kserve/tests/configs/mnist_v1_cpu.yaml
+++ b/kubernetes/kserve/tests/configs/mnist_v1_cpu.yaml
@@ -6,4 +6,5 @@ spec:
   predictor:
     pytorch:
       storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
-      image: pytorch/torchserve-kfs-nightly:latest-cpu
+      #storageUri: gs://mnist-v1/v1
+      image: agunapal/torchserve-kfs:latest-cpu
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ kubectl apply -f tests/
configs/ scripts/ 
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ kubectl apply -f tests/configs/
mnist_v1_cpu.yaml  mnist_v2_cpu.yaml  s3_secret.yaml     
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ kubectl apply -f tests/configs/mnist_v1_cpu.yaml 
inferenceservice.serving.kserve.io/torchserve created
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ kubectl get pods
NAME                                                     READY   STATUS    RESTARTS   AGE
torchserve-predictor-00001-deployment-6fd8b46fff-btgtv   2/2     Running   0          90s
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ kubectl logs torchserve-predictor-00001-deployment-6fd8b46fff-btgtv 
Defaulted container "kserve-container" out of: kserve-container, queue-proxy, storage-initializer (init)
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2023-12-08T01:29:24,666 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2023-12-08T01:29:24,669 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-12-08T01:29:24,844 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2023-12-08T01:29:24,971 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.9.0
TS Home: /home/venv/lib/python3.9/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 494 M
Python executable: /home/venv/bin/python
Config file: /mnt/models/config/config.properties
Inference address: http://0.0.0.0:8085
Management address: http://0.0.0.0:8085
Metrics address: http://0.0.0.0:8082
Model Store: /mnt/models/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 4
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: true
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
Workflow Store: /mnt/models/model-store
Model config: N/A
2023-12-08T01:29:24,977 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2023-12-08T01:29:25,049 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring models from snapshot {"name":"startup.cfg","modelCount":1,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":10,"responseTimeout":120}}}}
2023-12-08T01:29:25,055 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot startup.cfg
2023-12-08T01:29:25,055 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot startup.cfg validated successfully
2023-12-08T01:29:25,359 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model mnist
2023-12-08T01:29:25,359 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2023-12-08T01:29:25,360 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2023-12-08T01:29:25,360 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model mnist loaded.
2023-12-08T01:29:25,360 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: mnist, count: 1
2023-12-08T01:29:25,368 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-12-08T01:29:25,368 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-12-08T01:29:25,566 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8085
2023-12-08T01:29:25,566 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-12-08T01:29:25,567 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
Model server started.
2023-12-08T01:29:26,251 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2023-12-08T01:29:26,449 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:42.9|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6fd8b46fff-btgtv,timestamp:1701998966
2023-12-08T01:29:26,450 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:24.296466827392578|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6fd8b46fff-btgtv,timestamp:1701998966
2023-12-08T01:29:26,450 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:169.4939842224121|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6fd8b46fff-btgtv,timestamp:1701998966
2023-12-08T01:29:26,450 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:87.5|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6fd8b46fff-btgtv,timestamp:1701998966
2023-12-08T01:29:26,450 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:58749.8125|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6fd8b46fff-btgtv,timestamp:1701998966
2023-12-08T01:29:26,451 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:4158.91796875|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6fd8b46fff-btgtv,timestamp:1701998966
2023-12-08T01:29:26,451 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:7.7|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6fd8b46fff-btgtv,timestamp:1701998966
INFO:root:Wrapper : Model names ['mnist'], inference address http://0.0.0.0:8085, management address http://0.0.0.0:8085, grpc_inference_address, 0.0.0.0:7070, model store /mnt/models/model-store
INFO:root:Predict URL set to 0.0.0.0:8085
INFO:root:Explain URL set to 0.0.0.0:8085
INFO:root:Protocol version is v1
INFO:root:Model directory is /mnt/models/model-store
INFO:root:Copying contents of /mnt/models/model-store to local
INFO:root:TSModelRepo is initialized
INFO:kserve:Registering model: mnist
INFO:kserve:Setting max asyncio worker threads as 5
INFO:kserve:Starting uvicorn with 1 workers
2023-12-08 01:29:26.760 uvicorn.error INFO:     Started server process [9]
2023-12-08 01:29:26.760 uvicorn.error INFO:     Waiting for application startup.
2023-12-08 01:29:26.763 9 kserve INFO [start():62] Starting gRPC server on [::]:8081
2023-12-08 01:29:26.763 uvicorn.error INFO:     Application startup complete.
2023-12-08 01:29:26.764 uvicorn.error INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
2023-12-08T01:29:28,042 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9000, pid=56
2023-12-08T01:29:28,043 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2023-12-08T01:29:28,094 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-12-08T01:29:28,095 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - [PID]56
2023-12-08T01:29:28,095 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Torch worker started.
2023-12-08T01:29:28,095 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2023-12-08T01:29:28,096 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change null -> WORKER_STARTED
2023-12-08T01:29:28,112 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2023-12-08T01:29:28,118 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2023-12-08T01:29:28,120 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1701998968120
2023-12-08T01:29:28,122 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1701998968122
2023-12-08T01:29:28,170 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - model_name: mnist, batchSize: 1
2023-12-08T01:29:28,892 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - generated new fontManager
2023-12-08T01:29:29,116 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-12-08T01:29:29,116 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2023-12-08T01:29:29,155 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - '/home/model-server/tmp/models/4878f2a23e9741e999afdfed6441f3cb/index_to_name.json' is missing. Inference output will not include class name.
2023-12-08T01:29:29,246 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1124
2023-12-08T01:29:29,246 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2023-12-08T01:29:29,247 [INFO ] W-9000-mnist_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:3881.0|#WorkerName:W-9000-mnist_1.0,Level:Host|#hostname:torchserve-predictor-00001-deployment-6fd8b46fff-btgtv,timestamp:1701998969
2023-12-08T01:29:29,247 [INFO ] W-9000-mnist_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:3.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-6fd8b46fff-btgtv,timestamp:1701998969
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ 
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ 
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ 
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ 
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ 
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ 
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ 
(torchserve) ubuntu@ip-172-31-60-100:~/serve/kubernetes/kserve$ minikube delete
🔥  Deleting "minikube" in docker ...
🔥  Deleting container "minikube" ...
🔥  Removing /home/ubuntu/.minikube/machines/minikube ...
💀  Removed all traces of the "minikube" cluster.

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@agunapal agunapal changed the title Changes to support LLM with kserve Changes to support no-archive model archives with kserve Dec 8, 2023
@agunapal agunapal requested a review from lxning December 8, 2023 01:50
@agunapal agunapal added this pull request to the merge queue Dec 8, 2023
Merged via the queue into master with commit b368468 Dec 8, 2023
12 checks passed
@agunapal agunapal deleted the issues/kserve_no_archive branch December 8, 2023 17:50
@chauhang chauhang added this to the v0.10.0 milestone Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants