Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated example for custom metrics and add backwards compatibility warnings and upgrade guide for metrics APIs #2516

Merged
merged 13 commits into from
Aug 24, 2023

Conversation

namannandan
Copy link
Collaborator

Description

Provide an updated example to show usage of custom metrics APIs and the prometheus metrics mode.
Also update metrics documentation to include warnings about backward compatibility with custom metrics APIs in prior versions.

Type of change

  • This change requires a documentation update

Feature/Issue validation/testing

  • Results of manually running the example have been included in the example documentation

@namannandan namannandan marked this pull request as ready for review August 4, 2023 07:54
@namannandan namannandan changed the title Updated example for metrics and backwards compatibility warnings Updated example for custom metrics and add backwards compatibility warnings for metrics APIs Aug 4, 2023
@codecov
Copy link

codecov bot commented Aug 4, 2023

Codecov Report

Merging #2516 (f92fa05) into master (cd7c47e) will not change coverage.
The diff coverage is n/a.

❗ Current head f92fa05 differs from pull request most recent head cecc10d. Consider uploading reports for the commit cecc10d to get more accurate results

@@           Coverage Diff           @@
##           master    #2516   +/-   ##
=======================================
  Coverage   72.77%   72.77%           
=======================================
  Files          78       78           
  Lines        3695     3695           
  Branches       58       58           
=======================================
  Hits         2689     2689           
  Misses       1002     1002           
  Partials        4        4           
Files Changed Coverage Δ
ts/metrics/metric_cache_yaml_impl.py 93.24% <ø> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc looks fine but I'm not convinced this is the right BC story

Here are some other options I see

  1. Keep BC guarantee: for example if old metric API is always a counter then it shouldn't be too hard to make the old function call the new function
  2. If it's not possible to get BC guarantee (I'd like to understand why this is the case a bit more): then also make sure in code when the old metrics API is called to print a warning once otherwise we'll be maintaining 2 metric APIs in teh future

Regardless of if we go for 1 or 2 it's helpful to explain to users why they should migrate and what are the benefits to them of doing so, one example might be more metric types but worth clarifying this more

docs/metrics.md Outdated

## Backwards compatibility warnings
1. Starting [v0.6.1](https://github.com/pytorch/serve/releases/tag/v0.6.1), the `add_metric` API signature changed\
from [add_metric(name, value, unit, idx=None, dimensions=None)](https://github.com/pytorch/serve/blob/61f1c4182e6e864c9ef1af99439854af3409d325/ts/metrics/metrics_store.py#L184)\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the old metrics were always counters? If that's the case then keeping BC automatically shouldn't be too hard

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prior metrics implementation did not have types associated with them. The new metrics implementation does add support for MetricTypes.

In addition, while the prior metrics implementation did not have a way to specify metrics and their specifications (name, unit, dimension names and type) in a central configuration file, the new metrics implementation introduced this, as a result which, the semantics of add_metric method was changed
from: Create a metric object and store in a list to emit
to: Add a metric object consisting only of its specifications (name, unit, dimension names and type) to a metics cache. The dimension values are provided at the time of updating a metric using the add_or_update method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of options to ensure backwards compatibility are as follows:

  1. Introduce a new method in metric_cache_abstract.py, say add_metric_bc which has the same signature as that of the old add_metric API. This method can internally call add_metric and then add_or_update on the metric object. The default metric type in this case would be counter.
  2. Change the name of the new add_metric method to add_metric_to_cache and reimplement the add_metric method to have the same signature as the old implementation. Then, the add_metric API can internally call add_metric_to_cache method and the add_or_update on the metric object.

Please share your thoughts on these approaches.

Copy link
Member

@msaroufim msaroufim Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like 2, add_metric_to_cache() seems more like an internal detail whereas what a user wants to do is add_metric(). While the semantics do change, the code won't break and that seems like a win

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Draft PR to implement backwards compatibility for add_metric API: #2525

```

- Step 4: Install [mtail](https://github.com/google/mtail/releases)
Copy link
Member

@msaroufim msaroufim Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention in deprecation note why mtail is no longer needed

@msaroufim
Copy link
Member

cc @duk0011

@duk0011
Copy link

duk0011 commented Aug 15, 2023

From user perspective, I like option 2 (change the name of the new add_metric method to add_metric_to_cache and reimplement the add_metric method to have the same signature as the old implementation). This will help with version upgrade with backward compatibility.

@namannandan namannandan changed the title Updated example for custom metrics and add backwards compatibility warnings for metrics APIs Updated example for custom metrics and add backwards compatibility warnings and upgrade guide for metrics APIs Aug 18, 2023
@namannandan
Copy link
Collaborator Author

Thanks for the feedback @msaroufim and @duk0011
PR: #2525 to make add_metric backwards compatible has been merged.
This PR includes the following:

  • Custom metrics API documentation updates
  • Backwards compatibility warnings
  • Upgrade path documentation
  • Updated example

@duk0011 please note the following in addition to the backwards compatibility implemented in #2525:

  • add_metric is now backwards compatible but the default metric type is inferred to be COUNTER. If the metric is of a different type, it will need to be specified in the call to add_metric as follows:
    metrics.add_metric(name='GenericMetric', value=10, unit='count', dimensions=[...], metric_type=MetricTypes.GAUGE)
  • All custom metrics updated in the custom handler will need to be included in the metrics configuration file for them to be emitted by Torchserve. This is shown here.

@msaroufim
Copy link
Member

SGTM let's explicitly mention your note when we do the patch release next week

@namannandan namannandan merged commit 2ff5020 into pytorch:master Aug 24, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants