Fix dimension names and documentation for metrics with prior promethe…

…us support (#2281) * Deprecate metrics_format configuration option in favor of metric_mode * Update dimension names and documentation for metrics with prior prometheus support --------- Co-authored-by: Naman Nandan <namannan@amazon.com>
pytorch · Apr 27, 2023 · 1707a74 · 1707a74
1 parent d6e072a
commit 1707a74
Show file tree

Hide file tree

Showing 16 changed files with 127 additions and 91 deletions.
diff --git a/docs/configuration.md b/docs/configuration.md
@@ -212,8 +212,6 @@ Set nvidia environment variables. For example:
 
 ### Enable metrics api
 * `enable_metrics_api` : Enable or disable metric apis i.e. it can be either `true` or `false`. Default: true (Enabled)
-* `metrics_format` : Use this to specify metric report format . At present, the only supported and default value for this is `prometheus`
-		     This is used in conjunction with `enable_metrics_api` option above.
 
 ### Config model
 * `models`: Use this to set configurations specific to a model. The value is presented in json format.

diff --git a/docs/metrics.md b/docs/metrics.md
@@ -18,8 +18,9 @@ Frontend metrics include system level metrics. The host resource utilization fro
 Torchserve provides an API to collect custom backend metrics. Metrics defined by a custom service or handler code can be collected per request or per a batch of requests.
 Two metric modes are supported, i.e `log` and `prometheus`. The default mode is `log`.
 Metrics mode can be configured using the `metrics_mode` configuration option in `config.properties` or `TS_METRICS_MODE` environment variable.
-In `log` mode, Metrics are logged and can be aggregated by metric agents.
+For further details on `config.properties` and environment variable based configuration, refer [Torchserve config](configuration.md) docs.
 
+In `log` mode, Metrics are logged and can be aggregated by metric agents.
 Metrics are collected by default at the following locations in `log` mode:
 
 * Frontend metrics - `log_directory/ts_metrics.log`
@@ -31,27 +32,27 @@ In `prometheus` mode, all metrics are made available in prometheus format via th
 
 ## Frontend Metrics
 
-| Metric Name                       | Type    | Unit         | Dimensions                        | Semantics                                                                   |
-|-----------------------------------|---------|--------------|-----------------------------------|-----------------------------------------------------------------------------|
-| Requests2XX                       | counter | Count        | Level, Hostname                   | Total number of requests with response in 200-300 status code range         |
-| Requests4XX                       | counter | Count        | Level, Hostname                   | Total number of requests with response in 400-500 status code range         |
-| Requests5XX                       | counter | Count        | Level, Hostname                   | Total number of requests with response status code above 500                |
-| ts_inference_requests_total       | counter | Count        | ModelName, ModelVersion, Hostname | Total number of inference requests received                                 |
-| ts_inference_latency_microseconds | counter | Microseconds | ModelName, ModelVersion, Hostname | Total inference latency in Microseconds                                     |
-| ts_queue_latency_microseconds     | counter | Microseconds | ModelName, ModelVersion, Hostname | Total queue latency in Microseconds                                         |
-| QueueTime                         | gauge   | Milliseconds | Level, Hostname                   | Time spent by a job in request queue in Milliseconds                        |
-| WorkerThreadTime                  | gauge   | Milliseconds | Level, Hostname                   | Time spent in worker thread excluding backend response time in Milliseconds |
-| WorkerLoadTime                    | gauge   | Milliseconds | WorkerName, Level, Hostname       | Time taken by worker to load model in Milliseconds                          |
-| CPUUtilization                    | gauge   | Percent      | Level, Hostname                   | CPU utilization on host                                                     |
-| MemoryUsed                        | gauge   | Megabytes    | Level, Hostname                   | Memory used on host                                                         |
-| MemoryAvailable                   | gauge   | Megabytes    | Level, Hostname                   | Memory available on host                                                    |
-| MemoryUtilization                 | gauge   | Percent      | Level, Hostname                   | Memory utilization on host                                                  |
-| DiskUsage                         | gauge   | Gigabytes    | Level, Hostname                   | Disk used on host                                                           |
-| DiskUtilization                   | gauge   | Percent      | Level, Hostname                   | Disk used on host                                                           |
-| DiskAvailable                     | gauge   | Gigabytes    | Level, Hostname                   | Disk available on host                                                      |
-| GPUMemoryUtilization              | gauge   | Percent      | Level, DeviceId, Hostname         | GPU memory utilization on host, DeviceId                                    |
-| GPUMemoryUsed                     | gauge   | Megabytes    | Level, DeviceId, Hostname         | GPU memory used on host, DeviceId                                           |
-| GPUUtilization                    | gauge   | Percent      | Level, DeviceId, Hostname         | GPU utilization on host, DeviceId                                           |
+| Metric Name                       | Type    | Unit         | Dimensions                          | Semantics                                                                   |
+|-----------------------------------|---------|--------------|-------------------------------------|-----------------------------------------------------------------------------|
+| Requests2XX                       | counter | Count        | Level, Hostname                     | Total number of requests with response in 200-300 status code range         |
+| Requests4XX                       | counter | Count        | Level, Hostname                     | Total number of requests with response in 400-500 status code range         |
+| Requests5XX                       | counter | Count        | Level, Hostname                     | Total number of requests with response status code above 500                |
+| ts_inference_requests_total       | counter | Count        | model_name, model_version, hostname | Total number of inference requests received                                 |
+| ts_inference_latency_microseconds | counter | Microseconds | model_name, model_version, hostname | Total inference latency in Microseconds                                     |
+| ts_queue_latency_microseconds     | counter | Microseconds | model_name, model_version, hostname | Total queue latency in Microseconds                                         |
+| QueueTime                         | gauge   | Milliseconds | Level, Hostname                     | Time spent by a job in request queue in Milliseconds                        |
+| WorkerThreadTime                  | gauge   | Milliseconds | Level, Hostname                     | Time spent in worker thread excluding backend response time in Milliseconds |
+| WorkerLoadTime                    | gauge   | Milliseconds | WorkerName, Level, Hostname         | Time taken by worker to load model in Milliseconds                          |
+| CPUUtilization                    | gauge   | Percent      | Level, Hostname                     | CPU utilization on host                                                     |
+| MemoryUsed                        | gauge   | Megabytes    | Level, Hostname                     | Memory used on host                                                         |
+| MemoryAvailable                   | gauge   | Megabytes    | Level, Hostname                     | Memory available on host                                                    |
+| MemoryUtilization                 | gauge   | Percent      | Level, Hostname                     | Memory utilization on host                                                  |
+| DiskUsage                         | gauge   | Gigabytes    | Level, Hostname                     | Disk used on host                                                           |
+| DiskUtilization                   | gauge   | Percent      | Level, Hostname                     | Disk used on host                                                           |
+| DiskAvailable                     | gauge   | Gigabytes    | Level, Hostname                     | Disk available on host                                                      |
+| GPUMemoryUtilization              | gauge   | Percent      | Level, DeviceId, Hostname           | GPU memory utilization on host, DeviceId                                    |
+| GPUMemoryUsed                     | gauge   | Megabytes    | Level, DeviceId, Hostname           | GPU memory used on host, DeviceId                                           |
+| GPUUtilization                    | gauge   | Percent      | Level, DeviceId, Hostname           | GPU utilization on host, DeviceId                                           |
 
 ## Backend Metrics
 

diff --git a/docs/metrics_api.md b/docs/metrics_api.md
@@ -1,38 +1,83 @@
 # Metrics API
 
-Metrics API is listening on port 8082 and only accessible from localhost by default. To change the default setting, see [TorchServe Configuration](configuration.md). The default metrics endpoint returns Prometheus formatted metrics. You can query metrics using curl requests or point a [Prometheus Server](#prometheus-server) to the endpoint and use [Grafana](#grafana) for dashboards.
+Metrics API is listening on port 8082 and only accessible from localhost by default. To change the default setting, see [TorchServe Configuration](configuration.md). The default metrics endpoint returns Prometheus formatted metrics when [metrics_mode](https://github.com/pytorch/serve/blob/master/docs/metrics.md) configuration is set to `prometheus`. You can query metrics using curl requests or point a [Prometheus Server](#prometheus-server) to the endpoint and use [Grafana](#grafana) for dashboards.
 
-By default these APIs are enable however same can be disabled by setting `enable_metrics_api=false` in torchserve config.properties file.
+By default these APIs are enabled however same can be disabled by setting `enable_metrics_api=false` in torchserve config.properties file.
 For details refer [Torchserve config](configuration.md) docs.
 
 ```console
 curl http://127.0.0.1:8082/metrics
 
-# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
-# TYPE ts_inference_latency_microseconds counter
-ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 1990.348
-ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 2032.411
-# HELP ts_inference_requests_total Total number of inference requests.
+# HELP Requests5XX Torchserve prometheus counter metric with unit: Count
+# TYPE Requests5XX counter
+# HELP DiskUsage Torchserve prometheus gauge metric with unit: Gigabytes
+# TYPE DiskUsage gauge
+DiskUsage{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 20.054508209228516
+# HELP GPUUtilization Torchserve prometheus gauge metric with unit: Percent
+# TYPE GPUUtilization gauge
+# HELP PredictionTime Torchserve prometheus gauge metric with unit: ms
+# TYPE PredictionTime gauge
+PredictionTime{ModelName="resnet18",Level="Model",Hostname="88665a372f4b.ant.amazon.com",} 83.13
+# HELP WorkerLoadTime Torchserve prometheus gauge metric with unit: Milliseconds
+# TYPE WorkerLoadTime gauge
+WorkerLoadTime{WorkerName="W-9000-resnet18_1.0",Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 4593.0
+WorkerLoadTime{WorkerName="W-9001-resnet18_1.0",Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 4592.0
+# HELP MemoryAvailable Torchserve prometheus gauge metric with unit: Megabytes
+# TYPE MemoryAvailable gauge
+MemoryAvailable{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 5829.7421875
+# HELP GPUMemoryUsed Torchserve prometheus gauge metric with unit: Megabytes
+# TYPE GPUMemoryUsed gauge
+# HELP ts_inference_requests_total Torchserve prometheus counter metric with unit: Count
 # TYPE ts_inference_requests_total counter
-ts_inference_requests_total{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 1.0
-ts_inference_requests_total{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 1.0
-# HELP ts_queue_latency_microseconds Cumulative queue duration in microseconds
+ts_inference_requests_total{model_name="resnet18",model_version="default",hostname="88665a372f4b.ant.amazon.com",} 3.0
+# HELP GPUMemoryUtilization Torchserve prometheus gauge metric with unit: Percent
+# TYPE GPUMemoryUtilization gauge
+# HELP HandlerTime Torchserve prometheus gauge metric with unit: ms
+# TYPE HandlerTime gauge
+HandlerTime{ModelName="resnet18",Level="Model",Hostname="88665a372f4b.ant.amazon.com",} 82.93
+# HELP ts_inference_latency_microseconds Torchserve prometheus counter metric with unit: Microseconds
+# TYPE ts_inference_latency_microseconds counter
+ts_inference_latency_microseconds{model_name="resnet18",model_version="default",hostname="88665a372f4b.ant.amazon.com",} 290371.129
+# HELP CPUUtilization Torchserve prometheus gauge metric with unit: Percent
+# TYPE CPUUtilization gauge
+CPUUtilization{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 0.0
+# HELP MemoryUsed Torchserve prometheus gauge metric with unit: Megabytes
+# TYPE MemoryUsed gauge
+MemoryUsed{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 8245.62109375
+# HELP QueueTime Torchserve prometheus gauge metric with unit: Milliseconds
+# TYPE QueueTime gauge
+QueueTime{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 0.0
+# HELP ts_queue_latency_microseconds Torchserve prometheus counter metric with unit: Microseconds
 # TYPE ts_queue_latency_microseconds counter
-ts_queue_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 364.884
-ts_queue_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 82.349
+ts_queue_latency_microseconds{model_name="resnet18",model_version="default",hostname="88665a372f4b.ant.amazon.com",} 365.21
+# HELP DiskUtilization Torchserve prometheus gauge metric with unit: Percent
+# TYPE DiskUtilization gauge
+DiskUtilization{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 5.8
+# HELP Requests2XX Torchserve prometheus counter metric with unit: Count
+# TYPE Requests2XX counter
+Requests2XX{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 8.0
+# HELP Requests4XX Torchserve prometheus counter metric with unit: Count
+# TYPE Requests4XX counter
+# HELP WorkerThreadTime Torchserve prometheus gauge metric with unit: Milliseconds
+# TYPE WorkerThreadTime gauge
+WorkerThreadTime{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 1.0
+# HELP DiskAvailable Torchserve prometheus gauge metric with unit: Gigabytes
+# TYPE DiskAvailable gauge
+DiskAvailable{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 325.05113983154297
+# HELP MemoryUtilization Torchserve prometheus gauge metric with unit: Percent
+# TYPE MemoryUtilization gauge
+MemoryUtilization{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 64.4
 ```
 
 ```console
 curl "http://127.0.0.1:8082/metrics?name[]=ts_inference_latency_microseconds&name[]=ts_queue_latency_microseconds" --globoff
 
-# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
-# TYPE ts_inference_latency_microseconds counter
-ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 1990.348
-ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 2032.411
-# HELP ts_queue_latency_microseconds Cumulative queue duration in microseconds
+# HELP ts_queue_latency_microseconds Torchserve prometheus counter metric with unit: Microseconds
 # TYPE ts_queue_latency_microseconds counter
-ts_queue_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 364.884
-ts_queue_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 82.349
+ts_queue_latency_microseconds{model_name="resnet18",model_version="default",hostname="88665a372f4b.ant.amazon.com",} 365.21
+# HELP ts_inference_latency_microseconds Torchserve prometheus counter metric with unit: Microseconds
+# TYPE ts_inference_latency_microseconds counter
+ts_inference_latency_microseconds{model_name="resnet18",model_version="default",hostname="88665a372f4b.ant.amazon.com",} 290371.129
 ```
 
 #### Prometheus server
@@ -52,15 +97,15 @@ scrape_configs:
     static_configs:
     - targets: ['localhost:8082'] #TorchServe metrics endpoint
 ```
-Navigate to `http://localhost:9090/` on a browser to execute queries and create graphs 
+Navigate to `http://localhost:9090/` on a browser to execute queries and create graphs
 
-<img width="1231" alt="PrometheusServer" src="https://user-images.githubusercontent.com/880376/86984450-806fc680-c143-11ea-9ae2-f2ef42f24f4c.png">
+<img width="1231" alt="Prometheus Server" src="https://user-images.githubusercontent.com/5276346/234722761-007e168a-ebc0-4644-be60-23b2f33fa4f2.png">
 
 #### Grafana
 
 Once you have the Torchserve and Prometheus servers running, you can further [setup](https://prometheus.io/docs/visualization/grafana/) Grafana, point it to Prometheus server and navigate to `http://localhost:3000/` to create dashboards and graphs.
 
-You can use command given below to start Grafana - 
+You can use command given below to start Grafana -
 `sudo systemctl daemon-reload && sudo systemctl enable grafana-server && sudo systemctl start grafana-server`
 
-<img width="1220" alt="Screen Shot 2020-07-08 at 5 51 57 PM" src="https://user-images.githubusercontent.com/880376/86984550-c4fb6200-c143-11ea-9434-09d4d43dd6d4.png">
+<img width="1220" alt="Grafana Dashboard" src="https://user-images.githubusercontent.com/5276346/234725829-7f60e0d8-c76d-4019-ac8f-7d60069c4e58.png">