-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics Docs Updates #2560
Metrics Docs Updates #2560
Conversation
Updates to metrics docs to make them easier to follow
Something weird with this link. It is possible the site itself rejects the lint check
@namannandan & @agunapal Could you guys take a look at this, I want to make sure I did not change the meaning of the docs. I made some changes to the structure of the metrics docs to try and make them clearer per the issue #2495. |
docs/metrics_api.md
Outdated
For details refer [Torchserve config](configuration.md) docs. | ||
|
||
**Note** This is not to be confused with the [custom metrics API](metrics.md) which is the API used in the backend handler to emit metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could make this clearer for folks. It doesn't have to fit into a short note either
Codecov Report
@@ Coverage Diff @@
## master #2560 +/- ##
=======================================
Coverage 70.87% 70.87%
=======================================
Files 83 83
Lines 3839 3839
Branches 58 58
=======================================
Hits 2721 2721
Misses 1114 1114
Partials 4 4 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
docs/metrics.md
Outdated
Dynamic updates between the frontend and backend are _not_ currently being handled. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reword a little bit to express "For performance concern, currently only metrics defined in metrics configuration file can be published to Prometheus" (ie. available via the metrics API endpoint.)?
docs/metrics.md
Outdated
|
||
## Frontend Metrics | ||
|
||
Note that **only** the metrics defined in the **metrics configuration file** can be emitted to logs or made available via the [metrics API endpoint](metrics_api.md). This is done to ensure that the metrics configuration file serves as a central inventory of all the metrics that Torchserve can emit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"can be emitted to logs" => "can be emitted to model_metrics.log", otherwise dumps to model_log.log
Metrics docs updates based off comments.
…into metricsUpdate
@msaroufim @lxning My github is acting up but made some updates based. off your comments. let me know if other changes are needed. |
@namannandan could you please review and merge this? |
docs/metrics.md
Outdated
|
||
TorchServe defines metrics in a [yaml](https://github.com/pytorch/serve/blob/master/ts/configs/metrics.yaml) file, including both frontend metrics (i.e. `ts_metrics`) and backend metrics (i.e. `model_metrics`). | ||
When TorchServe is started, the metrics definition is loaded in the frontend and backend cache separately. | ||
The backend flushes the metrics cache once a load model or inference request is completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be reworded as follows:
The backend emits metrics logs as and when they are updated. The frontend parses these logs and makes the corresponding metrics available either as logs or via the metrics API endpoint based on the metrics_mode
configuration.
docs/metrics.md
Outdated
When TorchServe is started, the metrics definition is loaded in the frontend and backend cache separately. | ||
The backend flushes the metrics cache once a load model or inference request is completed. | ||
|
||
Dynamic updates between the frontend and backend are _not_ currently being handled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be reworded as follows:
Dynamic updates to the metrics configuration file is currently not supported. In order to take into account, updates made to the metrics configuration file, Torchserve will need to be restarted.
docs/metrics.md
Outdated
|
||
Note that **only** the metrics defined in the **metrics configuration file** can be emitted to model_metrics.log or made available via the [metrics API endpoint](metrics_api.md). This is done to ensure that the metrics configuration file serves as a central inventory of all the metrics that Torchserve can emit. | ||
|
||
Default metrics are provided in the [metrics.yaml](https://github.com/pytorch/serve/blob/master/ts/configs/metrics.yaml) file, but the user can either delete them to their liking / ignore them altogether, because these metrics will not be emitted unless they are edited.\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
..... because these metrics will not be emitted unless they are updated.
instead of
..... because these metrics will not be emitted unless they are edited.
docs/metrics.md
Outdated
|
||
### Starting TorchServe Metrics | ||
|
||
Whenever torchserve starts, the [backend worker](https://github.com/pytorch/serve/blob/master/ts/model_service_worker.py) initializes `service.context.metrics` with the [MetricsCache](https://github.com/pytorch/serve/blob/master/ts/metrics/metric_cache_yaml_impl.py) object. The `model_metrics` (backend metrics) section within the specified yaml file will be parsed, and Metric objects will be created based on the parsed section and added that are added to the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: ..... will be created based on the parsed section and added to the cache.
docs/metrics.md
Outdated
metric1 = metrics.add_metric("GenericMetric", unit=unit, dimension_names=["name1", "name2", ...], metric_type=MetricTypes.GAUGE) | ||
metric.add_or_update(value, dimension_values=["value1", "value2", ...]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace the above two lines with the following:
metrics.add_metric("GenericMetric", value, unit=unit, dimension_names=["name1", "name2", ...], metric_type=MetricTypes.GAUGE)
docs/metrics.md
Outdated
@@ -317,35 +338,31 @@ dimN= Dimension(name_n, value_n) | |||
|
|||
One can add metrics with generic units using the following function. | |||
|
|||
#### Function API to add generic metrics without default dimensions | |||
Function API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please retain this section about add_metric_to_cache
method since it is part of the API.
docs/metrics.md
Outdated
@@ -370,52 +387,10 @@ One can add metrics with generic units using the following function. | |||
# Add Distance as a metric | |||
# dimensions = [dim1, dim2, dim3, ..., dimN] | |||
# Assuming batch size is 1 for example | |||
metric = metrics.add_metric_to_cache('DistanceInKM', unit='km', dimension_names=[...]) | |||
metric = metrics.add_metric('DistanceInKM', unit='km', dimension_names=[...]) | |||
metric.add_or_update(distance, dimension_values=[...]) | |||
``` | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please retain the following section on add_metric
.
docs/metrics.md
Outdated
@@ -425,15 +400,15 @@ Add time-based by invoking the following method: | |||
Function API | |||
|
|||
```python | |||
def add_time(self, name: str, value: int or float, idx=None, unit: str = 'ms', dimensions: list = None, | |||
def add_time(self, metric_name: str, value: int or float, idx=None, unit: str = 'ms', dimensions: list = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument name defined in the source code is name
and not metric_name
.
The same applies to other functions of the API below.
docs/metrics.md
Outdated
metric_type: MetricTypes = MetricTypes.GAUGE): | ||
""" | ||
Add a time based metric like latency, default unit is 'ms' | ||
Default metric type is gauge | ||
|
||
Parameters | ||
---------- | ||
name : str | ||
metric_name : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like for these changes, I might have been using a slightly older metrics.md before you made your changes. I have updated this.
docs/metrics_api.md
Outdated
@@ -1,10 +1,13 @@ | |||
# Metrics API | |||
|
|||
Metrics API is listening on port 8082 and only accessible from localhost by default. To change the default setting, see [TorchServe Configuration](configuration.md). The default metrics endpoint returns Prometheus formatted metrics when [metrics_mode](https://github.com/pytorch/serve/blob/master/docs/metrics.md) configuration is set to `prometheus`. You can query metrics using curl requests or point a [Prometheus Server](#prometheus-server) to the endpoint and use [Grafana](#grafana) for dashboards. | |||
Metrics API is a http API that is used to fetch metrics in the prometheus format. It is listening on port 8082 and only accessible from localhost by default. To change the default setting, see [TorchServe Configuration](configuration.md). The metrics endpoint is on by default and returns Prometheus formatted metrics when [metrics_mode](https://github.com/pytorch/serve/blob/master/docs/metrics.md) configuration is set to `prometheus`. You can query metrics using curl requests or point a [Prometheus Server](#prometheus-server) to the endpoint and use [Grafana](#grafana) for dashboards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: ..... The metrics endpoint is enabled by default and .....
Updates per Naman's comments
…into metricsUpdate
@namannandan Thanks for the feedback. I have made the updates. Let me know if any other updates are needed. |
Updates to metrics docs to make them easier to follow. Information re-organized and emphasized.
Fixes #2495
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Pages built locally to test them.
Checklist: