Skip to content

Commit

Permalink
Fix dimension names and documentation for metrics with prior promethe…
Browse files Browse the repository at this point in the history
…us support (#2281)

* Deprecate metrics_format configuration option in favor of metric_mode

* Update dimension names and documentation for metrics with prior prometheus support

---------

Co-authored-by: Naman Nandan <namannan@amazon.com>
  • Loading branch information
namannandan and Naman Nandan authored Apr 27, 2023
1 parent d6e072a commit 1707a74
Show file tree
Hide file tree
Showing 16 changed files with 127 additions and 91 deletions.
2 changes: 0 additions & 2 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,8 +212,6 @@ Set nvidia environment variables. For example:

### Enable metrics api
* `enable_metrics_api` : Enable or disable metric apis i.e. it can be either `true` or `false`. Default: true (Enabled)
* `metrics_format` : Use this to specify metric report format . At present, the only supported and default value for this is `prometheus`
This is used in conjunction with `enable_metrics_api` option above.

### Config model
* `models`: Use this to set configurations specific to a model. The value is presented in json format.
Expand Down
45 changes: 23 additions & 22 deletions docs/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@ Frontend metrics include system level metrics. The host resource utilization fro
Torchserve provides an API to collect custom backend metrics. Metrics defined by a custom service or handler code can be collected per request or per a batch of requests.
Two metric modes are supported, i.e `log` and `prometheus`. The default mode is `log`.
Metrics mode can be configured using the `metrics_mode` configuration option in `config.properties` or `TS_METRICS_MODE` environment variable.
In `log` mode, Metrics are logged and can be aggregated by metric agents.
For further details on `config.properties` and environment variable based configuration, refer [Torchserve config](configuration.md) docs.

In `log` mode, Metrics are logged and can be aggregated by metric agents.
Metrics are collected by default at the following locations in `log` mode:

* Frontend metrics - `log_directory/ts_metrics.log`
Expand All @@ -31,27 +32,27 @@ In `prometheus` mode, all metrics are made available in prometheus format via th

## Frontend Metrics

| Metric Name | Type | Unit | Dimensions | Semantics |
|-----------------------------------|---------|--------------|-----------------------------------|-----------------------------------------------------------------------------|
| Requests2XX | counter | Count | Level, Hostname | Total number of requests with response in 200-300 status code range |
| Requests4XX | counter | Count | Level, Hostname | Total number of requests with response in 400-500 status code range |
| Requests5XX | counter | Count | Level, Hostname | Total number of requests with response status code above 500 |
| ts_inference_requests_total | counter | Count | ModelName, ModelVersion, Hostname | Total number of inference requests received |
| ts_inference_latency_microseconds | counter | Microseconds | ModelName, ModelVersion, Hostname | Total inference latency in Microseconds |
| ts_queue_latency_microseconds | counter | Microseconds | ModelName, ModelVersion, Hostname | Total queue latency in Microseconds |
| QueueTime | gauge | Milliseconds | Level, Hostname | Time spent by a job in request queue in Milliseconds |
| WorkerThreadTime | gauge | Milliseconds | Level, Hostname | Time spent in worker thread excluding backend response time in Milliseconds |
| WorkerLoadTime | gauge | Milliseconds | WorkerName, Level, Hostname | Time taken by worker to load model in Milliseconds |
| CPUUtilization | gauge | Percent | Level, Hostname | CPU utilization on host |
| MemoryUsed | gauge | Megabytes | Level, Hostname | Memory used on host |
| MemoryAvailable | gauge | Megabytes | Level, Hostname | Memory available on host |
| MemoryUtilization | gauge | Percent | Level, Hostname | Memory utilization on host |
| DiskUsage | gauge | Gigabytes | Level, Hostname | Disk used on host |
| DiskUtilization | gauge | Percent | Level, Hostname | Disk used on host |
| DiskAvailable | gauge | Gigabytes | Level, Hostname | Disk available on host |
| GPUMemoryUtilization | gauge | Percent | Level, DeviceId, Hostname | GPU memory utilization on host, DeviceId |
| GPUMemoryUsed | gauge | Megabytes | Level, DeviceId, Hostname | GPU memory used on host, DeviceId |
| GPUUtilization | gauge | Percent | Level, DeviceId, Hostname | GPU utilization on host, DeviceId |
| Metric Name | Type | Unit | Dimensions | Semantics |
|-----------------------------------|---------|--------------|-------------------------------------|-----------------------------------------------------------------------------|
| Requests2XX | counter | Count | Level, Hostname | Total number of requests with response in 200-300 status code range |
| Requests4XX | counter | Count | Level, Hostname | Total number of requests with response in 400-500 status code range |
| Requests5XX | counter | Count | Level, Hostname | Total number of requests with response status code above 500 |
| ts_inference_requests_total | counter | Count | model_name, model_version, hostname | Total number of inference requests received |
| ts_inference_latency_microseconds | counter | Microseconds | model_name, model_version, hostname | Total inference latency in Microseconds |
| ts_queue_latency_microseconds | counter | Microseconds | model_name, model_version, hostname | Total queue latency in Microseconds |
| QueueTime | gauge | Milliseconds | Level, Hostname | Time spent by a job in request queue in Milliseconds |
| WorkerThreadTime | gauge | Milliseconds | Level, Hostname | Time spent in worker thread excluding backend response time in Milliseconds |
| WorkerLoadTime | gauge | Milliseconds | WorkerName, Level, Hostname | Time taken by worker to load model in Milliseconds |
| CPUUtilization | gauge | Percent | Level, Hostname | CPU utilization on host |
| MemoryUsed | gauge | Megabytes | Level, Hostname | Memory used on host |
| MemoryAvailable | gauge | Megabytes | Level, Hostname | Memory available on host |
| MemoryUtilization | gauge | Percent | Level, Hostname | Memory utilization on host |
| DiskUsage | gauge | Gigabytes | Level, Hostname | Disk used on host |
| DiskUtilization | gauge | Percent | Level, Hostname | Disk used on host |
| DiskAvailable | gauge | Gigabytes | Level, Hostname | Disk available on host |
| GPUMemoryUtilization | gauge | Percent | Level, DeviceId, Hostname | GPU memory utilization on host, DeviceId |
| GPUMemoryUsed | gauge | Megabytes | Level, DeviceId, Hostname | GPU memory used on host, DeviceId |
| GPUUtilization | gauge | Percent | Level, DeviceId, Hostname | GPU utilization on host, DeviceId |

## Backend Metrics

Expand Down
91 changes: 68 additions & 23 deletions docs/metrics_api.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,83 @@
# Metrics API

Metrics API is listening on port 8082 and only accessible from localhost by default. To change the default setting, see [TorchServe Configuration](configuration.md). The default metrics endpoint returns Prometheus formatted metrics. You can query metrics using curl requests or point a [Prometheus Server](#prometheus-server) to the endpoint and use [Grafana](#grafana) for dashboards.
Metrics API is listening on port 8082 and only accessible from localhost by default. To change the default setting, see [TorchServe Configuration](configuration.md). The default metrics endpoint returns Prometheus formatted metrics when [metrics_mode](https://github.com/pytorch/serve/blob/master/docs/metrics.md) configuration is set to `prometheus`. You can query metrics using curl requests or point a [Prometheus Server](#prometheus-server) to the endpoint and use [Grafana](#grafana) for dashboards.

By default these APIs are enable however same can be disabled by setting `enable_metrics_api=false` in torchserve config.properties file.
By default these APIs are enabled however same can be disabled by setting `enable_metrics_api=false` in torchserve config.properties file.
For details refer [Torchserve config](configuration.md) docs.

```console
curl http://127.0.0.1:8082/metrics

# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 1990.348
ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 2032.411
# HELP ts_inference_requests_total Total number of inference requests.
# HELP Requests5XX Torchserve prometheus counter metric with unit: Count
# TYPE Requests5XX counter
# HELP DiskUsage Torchserve prometheus gauge metric with unit: Gigabytes
# TYPE DiskUsage gauge
DiskUsage{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 20.054508209228516
# HELP GPUUtilization Torchserve prometheus gauge metric with unit: Percent
# TYPE GPUUtilization gauge
# HELP PredictionTime Torchserve prometheus gauge metric with unit: ms
# TYPE PredictionTime gauge
PredictionTime{ModelName="resnet18",Level="Model",Hostname="88665a372f4b.ant.amazon.com",} 83.13
# HELP WorkerLoadTime Torchserve prometheus gauge metric with unit: Milliseconds
# TYPE WorkerLoadTime gauge
WorkerLoadTime{WorkerName="W-9000-resnet18_1.0",Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 4593.0
WorkerLoadTime{WorkerName="W-9001-resnet18_1.0",Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 4592.0
# HELP MemoryAvailable Torchserve prometheus gauge metric with unit: Megabytes
# TYPE MemoryAvailable gauge
MemoryAvailable{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 5829.7421875
# HELP GPUMemoryUsed Torchserve prometheus gauge metric with unit: Megabytes
# TYPE GPUMemoryUsed gauge
# HELP ts_inference_requests_total Torchserve prometheus counter metric with unit: Count
# TYPE ts_inference_requests_total counter
ts_inference_requests_total{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 1.0
ts_inference_requests_total{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 1.0
# HELP ts_queue_latency_microseconds Cumulative queue duration in microseconds
ts_inference_requests_total{model_name="resnet18",model_version="default",hostname="88665a372f4b.ant.amazon.com",} 3.0
# HELP GPUMemoryUtilization Torchserve prometheus gauge metric with unit: Percent
# TYPE GPUMemoryUtilization gauge
# HELP HandlerTime Torchserve prometheus gauge metric with unit: ms
# TYPE HandlerTime gauge
HandlerTime{ModelName="resnet18",Level="Model",Hostname="88665a372f4b.ant.amazon.com",} 82.93
# HELP ts_inference_latency_microseconds Torchserve prometheus counter metric with unit: Microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{model_name="resnet18",model_version="default",hostname="88665a372f4b.ant.amazon.com",} 290371.129
# HELP CPUUtilization Torchserve prometheus gauge metric with unit: Percent
# TYPE CPUUtilization gauge
CPUUtilization{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 0.0
# HELP MemoryUsed Torchserve prometheus gauge metric with unit: Megabytes
# TYPE MemoryUsed gauge
MemoryUsed{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 8245.62109375
# HELP QueueTime Torchserve prometheus gauge metric with unit: Milliseconds
# TYPE QueueTime gauge
QueueTime{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 0.0
# HELP ts_queue_latency_microseconds Torchserve prometheus counter metric with unit: Microseconds
# TYPE ts_queue_latency_microseconds counter
ts_queue_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 364.884
ts_queue_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 82.349
ts_queue_latency_microseconds{model_name="resnet18",model_version="default",hostname="88665a372f4b.ant.amazon.com",} 365.21
# HELP DiskUtilization Torchserve prometheus gauge metric with unit: Percent
# TYPE DiskUtilization gauge
DiskUtilization{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 5.8
# HELP Requests2XX Torchserve prometheus counter metric with unit: Count
# TYPE Requests2XX counter
Requests2XX{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 8.0
# HELP Requests4XX Torchserve prometheus counter metric with unit: Count
# TYPE Requests4XX counter
# HELP WorkerThreadTime Torchserve prometheus gauge metric with unit: Milliseconds
# TYPE WorkerThreadTime gauge
WorkerThreadTime{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 1.0
# HELP DiskAvailable Torchserve prometheus gauge metric with unit: Gigabytes
# TYPE DiskAvailable gauge
DiskAvailable{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 325.05113983154297
# HELP MemoryUtilization Torchserve prometheus gauge metric with unit: Percent
# TYPE MemoryUtilization gauge
MemoryUtilization{Level="Host",Hostname="88665a372f4b.ant.amazon.com",} 64.4
```

```console
curl "http://127.0.0.1:8082/metrics?name[]=ts_inference_latency_microseconds&name[]=ts_queue_latency_microseconds" --globoff

# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 1990.348
ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 2032.411
# HELP ts_queue_latency_microseconds Cumulative queue duration in microseconds
# HELP ts_queue_latency_microseconds Torchserve prometheus counter metric with unit: Microseconds
# TYPE ts_queue_latency_microseconds counter
ts_queue_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 364.884
ts_queue_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 82.349
ts_queue_latency_microseconds{model_name="resnet18",model_version="default",hostname="88665a372f4b.ant.amazon.com",} 365.21
# HELP ts_inference_latency_microseconds Torchserve prometheus counter metric with unit: Microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{model_name="resnet18",model_version="default",hostname="88665a372f4b.ant.amazon.com",} 290371.129
```

#### Prometheus server
Expand All @@ -52,15 +97,15 @@ scrape_configs:
static_configs:
- targets: ['localhost:8082'] #TorchServe metrics endpoint
```
Navigate to `http://localhost:9090/` on a browser to execute queries and create graphs
Navigate to `http://localhost:9090/` on a browser to execute queries and create graphs

<img width="1231" alt="PrometheusServer" src="https://user-images.githubusercontent.com/880376/86984450-806fc680-c143-11ea-9ae2-f2ef42f24f4c.png">
<img width="1231" alt="Prometheus Server" src="https://user-images.githubusercontent.com/5276346/234722761-007e168a-ebc0-4644-be60-23b2f33fa4f2.png">

#### Grafana

Once you have the Torchserve and Prometheus servers running, you can further [setup](https://prometheus.io/docs/visualization/grafana/) Grafana, point it to Prometheus server and navigate to `http://localhost:3000/` to create dashboards and graphs.

You can use command given below to start Grafana -
You can use command given below to start Grafana -
`sudo systemctl daemon-reload && sudo systemctl enable grafana-server && sudo systemctl start grafana-server`

<img width="1220" alt="Screen Shot 2020-07-08 at 5 51 57 PM" src="https://user-images.githubusercontent.com/880376/86984550-c4fb6200-c143-11ea-9434-09d4d43dd6d4.png">
<img width="1220" alt="Grafana Dashboard" src="https://user-images.githubusercontent.com/5276346/234725829-7f60e0d8-c76d-4019-ac8f-7d60069c4e58.png">
Loading

0 comments on commit 1707a74

Please sign in to comment.