Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set model status using torchserve api #1878

Merged
merged 12 commits into from
Jan 31, 2024

Conversation

byeongjokim
Copy link
Contributor

@byeongjokim byeongjokim commented Sep 28, 2022

Description

Fixes #1773, kserve/kserve#1915

Issue:

1773 is the PR that tries to set the model status after TS ISVC is deployed. However, it causes problems.
In an initialization function of TS handler, you can do many things to prepare the model such as loading the model weight. (e.g., download tokenizer weight from online storage)
The current code may cause an error if someone calls the Kserve API before TS workers are initialized.
In particular, when the pod is scaled-out with readinessProbe(/v1/model/{mode_name} or /v2/model/{model_name}/status) set.

However, there is also a problem with the previous code.
It calls model.load() at the first request. In that case, an infinite cycle occurs between the ready status and the requests.

Therefore, to solve this problem, the model's ready status should be determined after communication with the torchserve Describe Model API.

Two types of API are used in load function.

  • /models/{model_name}?customized=false
    One is for checking only the worker's status. (when MODEL_LOAD_CUSTOMIZED is 'false')
    It is used when the initialization of the handler does not take a long time.
  • /models/{model_name}?customized=true
    The other is for checking the status of both the handler and the worker. (when MODEL_LOAD_CUSTOMIZED is 'true')
    The describe_handle function should be implemented in the custom handler. RelatedIssue
    It is used when the initialization of the handler takes a long time.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Feature/Issue validation/testing

  • Test to set model status ready
  • Test Kserve

log.log

  • 42 line: start to load model in kserve wrapper
  • 104 line: finished initialized torchserve handler
  • 124 line: set model.ready True in kserve wrapper
  • 130 line: first success of readinessProbe

Checklist:

  • Did you have fun?

@codecov
Copy link

codecov bot commented Oct 10, 2022

Codecov Report

Merging #1878 (7a5077b) into master (da6bb56) will increase coverage by 3.28%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1878      +/-   ##
==========================================
+ Coverage   41.67%   44.95%   +3.28%     
==========================================
  Files          55       63       +8     
  Lines        2282     2609     +327     
  Branches        1       56      +55     
==========================================
+ Hits          951     1173     +222     
- Misses       1331     1436     +105     
Impacted Files Coverage Δ
model-archiver/model_archiver/model_packaging.py 90.00% <0.00%> (ø)
...hiver/workflow_archiver/workflow_archiver_error.py 100.00% <0.00%> (ø)
...el-archiver/model_archiver/model_archiver_error.py 100.00% <0.00%> (ø)
...w-archiver/workflow_archiver/workflow_packaging.py 89.65% <0.00%> (ø)
...l-archiver/model_archiver/model_packaging_utils.py 54.96% <0.00%> (ø)
...iver/workflow_archiver/workflow_packaging_utils.py 68.68% <0.00%> (ø)
workflow-archiver/workflow_archiver/version.py 100.00% <0.00%> (ø)
model-archiver/model_archiver/version.py 100.00% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@gavrissh
Copy link

@jagadeeshi2i @byeongjokim @maaquib Do you know if this PR will be taken to complete state? Even we have hit a scenario that would need readiness check from the model side and this PR implements it.

@byeongjokim
Copy link
Contributor Author

@gavrishp @jagadeeshi2i @maaquib
If you need some supports, please let me know.

@agunapal
Copy link
Collaborator

Logs with llama model

Defaulted container "kserve-container" out of: kserve-container, queue-proxy, storage-initializer (init)
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2024-01-10T23:41:30,985 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2024-01-10T23:41:30,987 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2024-01-10T23:41:31,166 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-01-10T23:41:31,397 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.9.0
TS Home: /home/venv/lib/python3.9/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 1
Number of CPUs: 1
Max heap size: 4949 M
Python executable: /home/venv/bin/python
Config file: /mnt/models/config/config.properties
Inference address: http://0.0.0.0:8085
Management address: http://0.0.0.0:8085
Metrics address: http://0.0.0.0:8082
Model Store: /mnt/models/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 4
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: true
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
Workflow Store: /mnt/models/model-store
Model config: N/A
2024-01-10T23:41:31,458 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2024-01-10T23:41:31,482 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring models from snapshot {"name":"startup.cfg","modelCount":1,"models":{"llama2-7b-chat":{"1.0":{"defaultVersion":true,"marName":"llama2-7b-chat","minWorkers":1,"maxWorkers":1,"batchSize":1,"maxBatchDelay":100,"responseTimeout":1200}}}}
2024-01-10T23:41:31,490 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot startup.cfg
2024-01-10T23:41:31,491 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot startup.cfg validated successfully
2024-01-10T23:41:31,558 [INFO ] main org.pytorch.serve.archive.model.ModelArchive - createTempDir /home/model-server/tmp/models/84f9e5bc17b74f74895d4550b4a696db
2024-01-10T23:41:31,558 [INFO ] main org.pytorch.serve.archive.model.ModelArchive - createSymbolicDir /home/model-server/tmp/models/84f9e5bc17b74f74895d4550b4a696db/llama2-7b-chat
2024-01-10T23:41:31,569 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model llama2-7b-chat
2024-01-10T23:41:31,570 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model llama2-7b-chat
INFO:root:Wrapper : Model names ['llama2-7b-chat'], inference address http://0.0.0.0:8085, management address http://0.0.0.0:8085, grpc_inference_address, 0.0.0.0:7070, model store /mnt/models/model-store
INFO:root:Predict URL set to 0.0.0.0:8085
INFO:root:Explain URL set to 0.0.0.0:8085
INFO:root:Protocol version is v1
INFO:root:Copying contents of /mnt/models/model-store to local
INFO:root:Loading llama2-7b-chat .. 1 of 10 tries..
INFO:root:The model llama2-7b-chat is not ready
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
INFO:root:Loading llama2-7b-chat .. 2 of 10 tries..
INFO:root:The model llama2-7b-chat is not ready
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
INFO:root:Loading llama2-7b-chat .. 3 of 10 tries..
INFO:root:The model llama2-7b-chat is not ready
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
INFO:root:Loading llama2-7b-chat .. 4 of 10 tries..
INFO:root:The model llama2-7b-chat is not ready
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
INFO:root:Loading llama2-7b-chat .. 5 of 10 tries..
INFO:root:The model llama2-7b-chat is not ready
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
INFO:root:Loading llama2-7b-chat .. 6 of 10 tries..
INFO:root:The model llama2-7b-chat is not ready
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
2024-01-10T23:44:31,939 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model llama2-7b-chat
2024-01-10T23:44:31,951 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model llama2-7b-chat loaded.
2024-01-10T23:44:31,953 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: llama2-7b-chat, count: 1
2024-01-10T23:44:32,001 [DEBUG] W-9000-llama2-7b-chat_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-01-10T23:44:32,007 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2024-01-10T23:44:32,110 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8085
2024-01-10T23:44:32,110 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2024-01-10T23:44:32,111 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
Model server started.
INFO:root:Loading llama2-7b-chat .. 7 of 10 tries..
2024-01-10T23:44:32,587 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2024-01-10T23:44:32,960 [INFO ] epollEventLoopGroup-3-1 ACCESS_LOG - /127.0.0.1:58974 "GET /models/llama2-7b-chat?customized=false HTTP/1.1" 200 104
2024-01-10T23:44:32,964 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930272
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
2024-01-10T23:44:34,259 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:100.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,260 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:16.709884643554688|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,260 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:273.84106063842773|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,260 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:94.2|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,260 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,261 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,261 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,261 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:28514.8828125|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,262 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:2738.08203125|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:34,262 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:10.1|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930274
2024-01-10T23:44:35,542 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9000, pid=297
2024-01-10T23:44:35,543 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2024-01-10T23:44:35,556 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2024-01-10T23:44:35,556 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - [PID]297
2024-01-10T23:44:35,557 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Torch worker started.
2024-01-10T23:44:35,557 [DEBUG] W-9000-llama2-7b-chat_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-llama2-7b-chat_1.0 State change null -> WORKER_STARTED
2024-01-10T23:44:35,557 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2024-01-10T23:44:35,560 [INFO ] W-9000-llama2-7b-chat_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2024-01-10T23:44:35,566 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2024-01-10T23:44:35,569 [DEBUG] W-9000-llama2-7b-chat_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1704930275569
2024-01-10T23:44:35,571 [INFO ] W-9000-llama2-7b-chat_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1704930275571
2024-01-10T23:44:35,585 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - model_name: llama2-7b-chat, batchSize: 1
2024-01-10T23:44:37,184 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Enabled tensor cores
2024-01-10T23:44:37,184 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-01-10T23:44:37,185 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-01-10T23:44:37,185 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Transformers version 4.36.0
2024-01-10T23:44:37,188 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Model llama2-7b-chat loading tokenizer
2024-01-10T23:44:41,616 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
2024-01-10T23:44:41,794 [WARN ] W-9000-llama2-7b-chat_1.0-stderr MODEL_LOG - 
INFO:root:Loading llama2-7b-chat .. 8 of 10 tries..
2024-01-10T23:45:03,148 [INFO ] epollEventLoopGroup-3-2 ACCESS_LOG - /127.0.0.1:45908 "GET /models/llama2-7b-chat?customized=false HTTP/1.1" 200 45
2024-01-10T23:45:03,149 [INFO ] epollEventLoopGroup-3-2 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930303
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
2024-01-10T23:45:28,340 [WARN ] W-9000-llama2-7b-chat_1.0-stderr MODEL_LOG - Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
INFO:root:Loading llama2-7b-chat .. 9 of 10 tries..
2024-01-10T23:45:33,220 [INFO ] epollEventLoopGroup-3-3 ACCESS_LOG - /127.0.0.1:34106 "GET /models/llama2-7b-chat?customized=false HTTP/1.1" 200 36
2024-01-10T23:45:33,221 [INFO ] epollEventLoopGroup-3-3 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
2024-01-10T23:45:33,977 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:50.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,977 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:16.709636688232422|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,977 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:273.84130859375|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,978 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:94.2|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,978 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:29.042904290429043|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,978 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:6688.0|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,979 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:1.0|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,979 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:26628.29296875|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,979 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:4614.61328125|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:33,979 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:16.1|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930333
2024-01-10T23:45:45,383 [WARN ] W-9000-llama2-7b-chat_1.0-stderr MODEL_LOG - Loading checkpoint shards:  50%|█████     | 1/2 [00:46<00:46, 46.54s/it]
2024-01-10T23:45:45,384 [WARN ] W-9000-llama2-7b-chat_1.0-stderr MODEL_LOG - Loading checkpoint shards: 100%|██████████| 2/2 [01:03<00:00, 29.19s/it]
2024-01-10T23:45:45,385 [WARN ] W-9000-llama2-7b-chat_1.0-stderr MODEL_LOG - Loading checkpoint shards: 100%|██████████| 2/2 [01:03<00:00, 31.79s/it]
2024-01-10T23:45:45,612 [INFO ] W-9000-llama2-7b-chat_1.0-stdout MODEL_LOG - Model llama2-7b-chat loaded successfully
2024-01-10T23:45:45,618 [INFO ] W-9000-llama2-7b-chat_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 70046
2024-01-10T23:45:45,619 [DEBUG] W-9000-llama2-7b-chat_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-llama2-7b-chat_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-01-10T23:45:45,619 [INFO ] W-9000-llama2-7b-chat_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:73625.0|#WorkerName:W-9000-llama2-7b-chat_1.0,Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930345
2024-01-10T23:45:45,620 [INFO ] W-9000-llama2-7b-chat_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:5.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930345
INFO:root:Loading llama2-7b-chat .. 10 of 10 tries..
2024-01-10T23:46:03,268 [INFO ] epollEventLoopGroup-3-4 ACCESS_LOG - /127.0.0.1:33818 "GET /models/llama2-7b-chat?customized=false HTTP/1.1" 200 27
2024-01-10T23:46:03,269 [INFO ] epollEventLoopGroup-3-4 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930363
INFO:root:Sleep 30 seconds for load llama2-7b-chat..
INFO:root:The model llama2-7b-chat is ready
INFO:root:TSModelRepo is initialized
INFO:kserve:Registering model: llama2-7b-chat
INFO:kserve:Setting max asyncio worker threads as 12
INFO:kserve:Starting uvicorn with 1 workers
2024-01-10 23:46:33.377 uvicorn.error INFO:     Started server process [9]
2024-01-10 23:46:33.378 uvicorn.error INFO:     Waiting for application startup.
2024-01-10 23:46:33.428 9 kserve INFO [start():62] Starting gRPC server on [::]:8081
2024-01-10 23:46:33.429 uvicorn.error INFO:     Application startup complete.
2024-01-10 23:46:33.430 uvicorn.error INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
2024-01-10T23:46:33,778 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,779 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:16.709423065185547|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,779 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:273.8415222167969|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,779 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:94.2|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,779 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:36.45127670661803|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,780 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:8394.0|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,780 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,780 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:26622.671875|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,780 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:4620.765625|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393
2024-01-10T23:46:33,781 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:16.1|#Level:Host|#hostname:torchserve-predictor-00001-deployment-74dbbc6579-xwt6t,timestamp:1704930393

Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic works as shown in the logs attached to the PR. Approving for now

auto-merge was automatically disabled January 24, 2024 11:11

Head branch was pushed to by a user without write access

@byeongjokim
Copy link
Contributor Author

@agunapal
I changed the code because of lint workflows.
Then, the workflows stuck in required status.
Can u check it?

@byeongjokim
Copy link
Contributor Author

@agunapal

Thank you for processing the workflows.
How can i merge this PR?
There is not write access for me.

@agunapal agunapal added this pull request to the merge queue Jan 31, 2024
Merged via the queue into pytorch:master with commit 3627ee6 Jan 31, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants