Set model status using torchserve api #1878

byeongjokim · 2022-09-28T03:09:52Z

Description

Fixes #1773, kserve/kserve#1915

Issue:

1773 is the PR that tries to set the model status after TS ISVC is deployed. However, it causes problems.
In an initialization function of TS handler, you can do many things to prepare the model such as loading the model weight. (e.g., download tokenizer weight from online storage)
The current code may cause an error if someone calls the Kserve API before TS workers are initialized.
In particular, when the pod is scaled-out with readinessProbe(/v1/model/{mode_name} or /v2/model/{model_name}/status) set.

However, there is also a problem with the previous code.
It calls model.load() at the first request. In that case, an infinite cycle occurs between the ready status and the requests.

Therefore, to solve this problem, the model's ready status should be determined after communication with the torchserve Describe Model API.

Two types of API are used in load function.

/models/{model_name}?customized=false
One is for checking only the worker's status. (when MODEL_LOAD_CUSTOMIZED is 'false')
It is used when the initialization of the handler does not take a long time.
/models/{model_name}?customized=true
The other is for checking the status of both the handler and the worker. (when MODEL_LOAD_CUSTOMIZED is 'true')
The describe_handle function should be implemented in the custom handler. RelatedIssue
It is used when the initialization of the handler takes a long time.

Type of change

Bug fix (non-breaking change which fixes an issue)

Feature/Issue validation/testing

Test to set model status ready
Test Kserve

log.log

42 line: start to load model in kserve wrapper
104 line: finished initialized torchserve handler
124 line: set model.ready True in kserve wrapper
130 line: first success of readinessProbe

Checklist:

Did you have fun?

codecov · 2022-10-10T17:01:04Z

Codecov Report

Merging #1878 (7a5077b) into master (da6bb56) will increase coverage by 3.28%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1878      +/-   ##
==========================================
+ Coverage   41.67%   44.95%   +3.28%     
==========================================
  Files          55       63       +8     
  Lines        2282     2609     +327     
  Branches        1       56      +55     
==========================================
+ Hits          951     1173     +222     
- Misses       1331     1436     +105

Impacted Files	Coverage Δ
model-archiver/model_archiver/model_packaging.py	`90.00% <0.00%> (ø)`
...hiver/workflow_archiver/workflow_archiver_error.py	`100.00% <0.00%> (ø)`
...el-archiver/model_archiver/model_archiver_error.py	`100.00% <0.00%> (ø)`
...w-archiver/workflow_archiver/workflow_packaging.py	`89.65% <0.00%> (ø)`
...l-archiver/model_archiver/model_packaging_utils.py	`54.96% <0.00%> (ø)`
...iver/workflow_archiver/workflow_packaging_utils.py	`68.68% <0.00%> (ø)`
workflow-archiver/workflow_archiver/version.py	`100.00% <0.00%> (ø)`
model-archiver/model_archiver/version.py	`100.00% <0.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

gavrissh · 2023-10-11T10:06:38Z

@jagadeeshi2i @byeongjokim @maaquib Do you know if this PR will be taken to complete state? Even we have hit a scenario that would need readiness check from the model side and this PR implements it.

lint failure

byeongjokim · 2023-12-21T05:47:34Z

@gavrishp @jagadeeshi2i @maaquib
If you need some supports, please let me know.

agunapal · 2024-01-10T23:47:53Z

Logs with llama model

Defaulted container "kserve-container" out of: kserve-container, queue-proxy, storage-initializer (init)
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2024-01-10T23:41:30,985 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2024-01-10T23:41:30,987 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2024-01-10T23:41:31,166 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-01-10T23:41:31,397 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.9.0
TS Home: /home/venv/lib/python3.9/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 1
Number of CPUs: 1
Max heap size: 4949 M
Python executable: /home/venv/bin/python
Config file: /mnt/models/config/config.properties
Inference address: http://0.0.0.0:8085
Management address: http://0.0.0.0:8085
Metrics address: http://0.0.0.0:8082
Model Store: /mnt/models/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 4
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: true
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
Workflow Store: /mnt/models/model-store
Model config: N/A
2024-01-10T23:41:31,458 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2024-01-10T23:41:31,482 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring models from snapshot {"name":"startup.cfg","modelCount":1,"models":{"llama2-7b-chat":{"1.0":{"defaultVersion":true,"marName":"llama2-7b-chat","minWorkers":1,"maxWorkers":1,"batchSize":1,"maxBatchDelay":100,"responseTimeout":1200}}}}
2024-01-10T23:41:31,490 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot startup.cfg
2024-01-10T23:41:31,491 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot startup.cfg validated successfully
2024-01-10T23:41:31,558 [INFO ] main org.pytorch.serve.archive.model.ModelArchive - createTempDir /home/model-server/tmp/models/84f9e5bc17b74f74895d4550b4a696db
2024-01-10T23:41:31,558 [INFO ] main org.pytorch.serve.archive.model.ModelArchive - createSymbolicDir /home/model-server/tmp/models/84f9e5bc17b74f74895d4550b4a696db/llama2-7b-chat
2024-01-10T23:41:31,569 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model llama2-7b-chat
2024-01-10T23:41:31,570 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model llama2-7b-chat
INFO:root:Wrapper : Model names ['llama2-7b-chat'], inference address http://0.0.0.0:8085, management address http://0.0.0.0:8085, grpc_inference_address, 0.0.0.0:7070, model store /mnt/models/model-store
INFO:root:Predict URL set to 0.0.0.0:8085
INFO:root:Explain URL set to 0.0.0.0:8085
INFO:root:Protocol version is v1
INFO:root:Copying contents of /mnt/models/model-store to local
INFO:root:Loading llama2-7b-chat .. 1 of 10 tries..
INFO:root:The model llama2-7b-chat is not ready
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
INFO:root:Loading llama2-7b-chat .. 2 of 10 tries..
INFO:root:The model llama2-7b-chat is not ready
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
INFO:root:Loading llama2-7b-chat .. 3 of 10 tries..
INFO:root:The model llama2-7b-chat is not ready
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
INFO:root:Loading llama2-7b-chat .. 4 of 10 tries..
INFO:root:The model llama2-7b-chat is not ready
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
INFO:root:Loading llama2-7b-chat .. 5 of 10 tries..
INFO:root:The model llama2-7b-chat is not ready
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
INFO:root:Loading llama2-7b-chat .. 6 of 10 tries..
INFO:root:The model llama2-7b-chat is not ready
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
2024-01-10T23:44:31,939 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model llama2-7b-chat
2024-01-10T23:44:31,951 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model llama2-7b-chat loaded.
2024-01-10T23:44:31,953 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: llama2-7b-chat, count: 1
2024-01-10T23:44:32,001 [DEBUG] W-9000-llama2-7b-chat_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-01-10T23:44:32,007 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2024-01-10T23:44:32,110 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8085
2024-01-10T23:44:32,110 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2024-01-10T23:44:32,111 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
Model server started.
INFO:root:Loading llama2-7b-chat .. 7 of 10 tries..
2024-01-10T23:44:32,587 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2024-01-10T23:44:32,960 [INFO ] epollEventLoopGroup-3-1 ACCESS_LOG - /127.0.0.1:58974 "GET /models/llama2-7b-chat?customized=false HTTP/1.1" 200 104
2024-01-10T23:44:32,964 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930272
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
2024-01-10T23:44:34,259 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:100.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,260 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:16.709884643554688|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,260 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:273.84106063842773|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,260 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:94.2|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,260 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,261 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,261 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,261 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:28514.8828125|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,262 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:2738.08203125|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,262 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:10.1|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:35,542 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9000, pid=297
2024-01-10T23:44:35,543 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2024-01-10T23:44:35,556 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2024-01-10T23:44:35,556 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - [PID]297
2024-01-10T23:44:35,557 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Torch worker started.
2024-01-10T23:44:35,557 [DEBUG] W-9000-llama2-7b-chat_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-llama2-7b-chat_1.0 State change null -> WORKER_STARTED
2024-01-10T23:44:35,557 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2024-01-10T23:44:35,560 [INFO ] W-9000-llama2-7b-chat_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2024-01-10T23:44:35,566 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2024-01-10T23:44:35,569 [DEBUG] W-9000-llama2-7b-chat_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1704930275569
2024-01-10T23:44:35,571 [INFO ] W-9000-llama2-7b-chat_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1704930275571
2024-01-10T23:44:35,585 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - model_name: llama2-7b-chat, batchSize: 1
2024-01-10T23:44:37,184 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Enabled tensor cores
2024-01-10T23:44:37,184 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-01-10T23:44:37,185 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-01-10T23:44:37,185 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Transformers version 4.36.0
2024-01-10T23:44:37,188 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Model llama2-7b-chat loading tokenizer
2024-01-10T23:44:41,616 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
2024-01-10T23:44:41,794 [WARN ] W-9000-llama2-7b-chat_1.0-stderr MODEL_LOG - 
INFO:root:Loading llama2-7b-chat .. 8 of 10 tries..
2024-01-10T23:45:03,148 [INFO ] epollEventLoopGroup-3-2 ACCESS_LOG - /127.0.0.1:45908 "GET /models/llama2-7b-chat?customized=false HTTP/1.1" 200 45
2024-01-10T23:45:03,149 [INFO ] epollEventLoopGroup-3-2 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930303
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
2024-01-10T23:45:28,340 [WARN ] W-9000-llama2-7b-chat_1.0-stderr MODEL_LOG - Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
INFO:root:Loading llama2-7b-chat .. 9 of 10 tries..
2024-01-10T23:45:33,220 [INFO ] epollEventLoopGroup-3-3 ACCESS_LOG - /127.0.0.1:34106 "GET /models/llama2-7b-chat?customized=false HTTP/1.1" 200 36
2024-01-10T23:45:33,221 [INFO ] epollEventLoopGroup-3-3 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
2024-01-10T23:45:33,977 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:50.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,977 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:16.709636688232422|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,977 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:273.84130859375|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,978 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:94.2|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,978 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:29.042904290429043|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,978 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:6688.0|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,979 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:1.0|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,979 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:26628.29296875|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,979 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:4614.61328125|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,979 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:16.1|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:45,383 [WARN ] W-9000-llama2-7b-chat_1.0-stderr MODEL_LOG - Loading checkpoint shards:  50%|█████     | 1/2 [00:46<00:46, 46.54s/it]
2024-01-10T23:45:45,384 [WARN ] W-9000-llama2-7b-chat_1.0-stderr MODEL_LOG - Loading checkpoint shards: 100%|██████████| 2/2 [01:03<00:00, 29.19s/it]
2024-01-10T23:45:45,385 [WARN ] W-9000-llama2-7b-chat_1.0-stderr MODEL_LOG - Loading checkpoint shards: 100%|██████████| 2/2 [01:03<00:00, 31.79s/it]
2024-01-10T23:45:45,612 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Model llama2-7b-chat loaded successfully
2024-01-10T23:45:45,618 [INFO ] W-9000-llama2-7b-chat_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 70046
2024-01-10T23:45:45,619 [DEBUG] W-9000-llama2-7b-chat_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-llama2-7b-chat_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-01-10T23:45:45,619 [INFO ] W-9000-llama2-7b-chat_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:73625.0|#WorkerName:W-9000-llama2-7b-chat_1.0,Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930345
2024-01-10T23:45:45,620 [INFO ] W-9000-llama2-7b-chat_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:5.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930345
INFO:root:Loading llama2-7b-chat .. 10 of 10 tries..
2024-01-10T23:46:03,268 [INFO ] epollEventLoopGroup-3-4 ACCESS_LOG - /127.0.0.1:33818 "GET /models/llama2-7b-chat?customized=false HTTP/1.1" 200 27
2024-01-10T23:46:03,269 [INFO ] epollEventLoopGroup-3-4 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930363
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
INFO:root:The model llama2-7b-chat is ready
INFO:root:TSModelRepo is initialized
INFO:kserve:Registering model: llama2-7b-chat
INFO:kserve:Setting max asyncio worker threads as 12
INFO:kserve:Starting uvicorn with 1 workers
2024-01-10 23:46:33.377 uvicorn.error INFO:     Started server process [9]
2024-01-10 23:46:33.378 uvicorn.error INFO:     Waiting for application startup.
2024-01-10 23:46:33.428 9 kserve INFO [start():62] Starting gRPC server on [::]:8081
2024-01-10 23:46:33.429 uvicorn.error INFO:     Application startup complete.
2024-01-10 23:46:33.430 uvicorn.error INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
2024-01-10T23:46:33,778 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,779 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:16.709423065185547|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,779 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:273.8415222167969|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,779 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:94.2|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,779 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:36.45127670661803|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,780 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:8394.0|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,780 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,780 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:26622.671875|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,780 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:4620.765625|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,781 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:16.1|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393

agunapal

The logic works as shown in the logs attached to the PR. Approving for now

byeongjokim · 2024-01-29T02:49:05Z

@agunapal
I changed the code because of lint workflows.
Then, the workflows stuck in required status.
Can u check it?

byeongjokim · 2024-01-31T06:19:43Z

@agunapal

Thank you for processing the workflows.
How can i merge this PR?
There is not write access for me.

byeongjo-kim added 3 commits September 27, 2022 18:53

using torchserve api when set model.ready=True

c8fc4ce

add customized describing model api if want to check handler's status

099b37c

modify env name

c6412d0

maaquib requested review from lxning and maaquib October 4, 2022 23:32

maaquib added the kfserving label Oct 4, 2022

Merge branch 'master' into fix/model-status-kserve

5dbed24

maaquib requested a review from jagadeeshi2i October 19, 2022 18:24

maaquib added 2 commits October 19, 2022 11:24

Merge branch 'master' into fix/model-status-kserve

d24afc6

Merge branch 'master' into fix/model-status-kserve

7a5077b

jagadeeshi2i added the kubernetes label Jul 24, 2023

agunapal added 2 commits December 8, 2023 11:09

Merge branch 'master' into fix/model-status-kserve

9ddf994

Update TorchserveModel.py

39d5140

lint failure

Merge branch 'master' into fix/model-status-kserve

6c76915

agunapal approved these changes Jan 11, 2024

View reviewed changes

agunapal enabled auto-merge January 11, 2024 00:13

Reformatted TorchserveModel.py

0d7edcc

auto-merge was automatically disabled January 24, 2024 11:11
Head branch was pushed to by a user without write access

Merge branch 'master' into fix/model-status-kserve

4a14041

Merge branch 'master' into fix/model-status-kserve

1d0f434

agunapal added this pull request to the merge queue Jan 31, 2024

Merged via the queue into pytorch:master with commit 3627ee6 Jan 31, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set model status using torchserve api #1878

Set model status using torchserve api #1878

byeongjokim commented Sep 28, 2022 •

edited

Loading

codecov bot commented Oct 10, 2022 •

edited

Loading

gavrissh commented Oct 11, 2023

byeongjokim commented Dec 21, 2023

agunapal commented Jan 10, 2024

agunapal left a comment

byeongjokim commented Jan 29, 2024

byeongjokim commented Jan 31, 2024

Set model status using torchserve api #1878

Set model status using torchserve api #1878

Conversation

byeongjokim commented Sep 28, 2022 • edited Loading

Description

Type of change

Feature/Issue validation/testing

Checklist:

codecov bot commented Oct 10, 2022 • edited Loading

Codecov Report

gavrissh commented Oct 11, 2023

byeongjokim commented Dec 21, 2023

agunapal commented Jan 10, 2024

agunapal left a comment

Choose a reason for hiding this comment

byeongjokim commented Jan 29, 2024

byeongjokim commented Jan 31, 2024

byeongjokim commented Sep 28, 2022 •

edited

Loading

codecov bot commented Oct 10, 2022 •

edited

Loading