Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metric cache and migrate existing metrics to cache implementation #2199

Merged
merged 21 commits into from
Apr 25, 2023

Conversation

lxning
Copy link
Collaborator

@lxning lxning commented Mar 24, 2023

Description

Implementation of Metric Cache and migration of existing metrics to cache implementation
Design: #1492

Fixes #2140 #2141

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

  • Unit tests
    Included in PR and run as part of sanity test in CI

  • Manual end to end tests to verify metrics frontend implementation
    Verified both log and prometheus modes

  • Regression tests

    • CPU: cpu_regression_log.txt
      • Relevant passing tests:
        • test_metrics.py::test_metrics_log_mode
        • test_metrics.py::test_metrics_prometheus_mode
      • Unrelated failing tests:
        • test_handler.py::test_echo_stream_inference
    • GPU: gpu_regression_log.txt
      • Relevant passing tests:
        • test_metrics.py::test_metrics_log_mode
        • test_metrics.py::test_metrics_prometheus_mode
      • Unrelated failing tests:
        • test_distributed_inference_handler.py::test_large_model_inference
        • test_handler.py::test_echo_stream_inference
        • test_sm_mme_requirements.py::test_oom_on_model_load
        • test_sm_mme_requirements.py::test_oom_on_invoke
  • Benchmark

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@namannandan namannandan changed the title [WIP] add metric cache [WIP] Add metric cache and migrate existing metrics to cache implementation Mar 30, 2023
@namannandan namannandan force-pushed the issue_2140 branch 2 times, most recently from 876120c to 833c28c Compare April 2, 2023 23:38
Comment on lines +304 to +311
for (Dimension dimension : parsedMetric.getDimensions()) {
dimensionValues.add(dimension.getValue());
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not directly parse the backend log to generate IMetric?

@codecov
Copy link

codecov bot commented Apr 20, 2023

Codecov Report

Merging #2199 (57ee9d3) into master (92bd04b) will decrease coverage by 0.09%.
The diff coverage is n/a.

❗ Current head 57ee9d3 differs from pull request most recent head 63a61ef. Consider uploading reports for the commit 63a61ef to get more accurate results

@@            Coverage Diff             @@
##           master    #2199      +/-   ##
==========================================
- Coverage   69.82%   69.73%   -0.09%     
==========================================
  Files          77       77              
  Lines        3420     3420              
  Branches       57       57              
==========================================
- Hits         2388     2385       -3     
- Misses       1029     1032       +3     
  Partials        3        3              

see 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@namannandan namannandan changed the title [WIP] Add metric cache and migrate existing metrics to cache implementation Add metric cache and migrate existing metrics to cache implementation Apr 24, 2023
Copy link
Collaborator Author

@lxning lxning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testModelMetrics should be updated and reflected in integration test ModelServerTest.java

Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving after chatting about this PR offline, I would still like to see some more docs around how a user goes about creating a custom python metric and how everything pipes back into a prometheus dashboard, critical to mention this in release notes as well and a tutorial as well although for the tutorial I'm fine if we do it in a followup PR

@namannandan namannandan merged commit 4e08fd1 into master Apr 25, 2023
@namannandan namannandan deleted the issue_2140 branch November 9, 2023 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metrics Caching
4 participants