[RFC]: Metrics Refactoring #1492 Draft PR #1727

joshuaan7 · 2022-07-07T19:05:54Z

TorchServe defines metrics in a metrics.yaml file, including both frontend metrics (i.e. ts_metrics) and backend metrics (i.e. model_metrics). When TorchServe is started, the metrics definition is loaded in frontend and backend cache separately. Backend flushes metrics cache once a load model or inference request is done.

Type of change

New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Checklist:

[ x] Did you have fun?
[ x] Have you added tests that prove your fix is effective or that this feature works?
[ x] Has code been commented, particularly in hard-to-understand areas?
[ x] Have you made corresponding changes to the documentation?

…d in yaml files

… str

ts/tests/metrics_yaml_testing/metric_cache_unit_test.py

ts/metrics/metric_cache.py

…ra ssertions

…sting for passing file path through properties

joshuaan7 · 2022-08-24T23:31:21Z

@lxning @maaquib @msaroufim Hi all, I am opening this PR up for review. I'm not sure if I have permissions to add reviewers but if there are more reviewers that should be added, please feel free to add. Apologies in advance for the large PR

frontend/server/src/main/java/org/pytorch/serve/wlm/WorkerLifeCycle.java

ts/arg_parser.py

ts/metrics/metric.py

ts/metrics/metric_cache_yaml.py

msaroufim

Just added a first round of feedback, may need a few more

examples/Huggingface_Transformers/Transformer_handler_generalized.py

msaroufim · 2022-09-06T11:38:01Z

examples/Huggingface_Transformers/Transformer_handler_generalized.py

+
+        metrics.add_size("GaugeModelMetricNameExample", 42.5)  # adding gauge metric
+
+        emit_metrics(metrics.cache)


Would it make more sense to have emit_metrics take in a metrics object directly?

frontend/server/src/test/resources/config.properties

docs/metrics.md

frontend/server/src/main/java/org/pytorch/serve/util/ConfigManager.java

msaroufim · 2022-09-06T12:27:27Z

ts/metrics/metric_cache_abstract.py

+
+
+class MetricCacheAbstract(metaclass=abc.ABCMeta):
+    def __init__(self, request_ids, model_name, file):


I thought we didn't yet support a different metric cache per model because we load a single YAML file?

Also wondering should the metric cache also have a concept of a worker?

ts/metrics/metric_cache_abstract.py

msaroufim · 2022-09-06T12:39:33Z

ts/metrics/metric_cache_abstract.py

+            logging.debug(f"Successfully received metric {metric_key}")
+            return metric_obj
+        else:
+            raise merrors.MetricsCacheKeyError(


Do we need to have the custom wrappers on errors? If we throw an error then we should also see a callstack which would show this line of code

Think it's more of a nice-to-have, specifying/narrowing down the area of what the error is + giving a bit more context

msaroufim · 2022-09-06T12:42:47Z

ts/metrics/metric_cache_abstract.py

+            dimensions=dimensions,
+        )
+
+    def add_percent(


I wonder if instead we should have

def add_counter(): def add_gauge(): def add_histogram():

Then on top of those we can add convenience wrappers like

def add_percentage(): add_gauge()

msaroufim

Just added a first round of feedback, may need a few more

examples/Huggingface_Transformers/Transformer_handler_generalized.py

namannandan · 2022-09-06T21:35:08Z

ts/metrics/metric.py

@@ -50,6 +55,8 @@ def __init__(self, name, value,
        self.value = value
        self.dimensions = dimensions
        self.request_id = request_id
+        self.metric_type = metric_type
+        self.is_updated = False if value == 0 else True


Nit: Would it make sense to use None instead of 0 to indicate that the value is not initialized, since 0 can potentially be a valid updated value for a metric?

So the main purpose of the is_updated attribute is to determine whether or not the metric gets emitted to the frontend (if the is_updated is True, then the Metric will be emitted, else it will not). I can understand that None may make more sense than using 0, but my other thought is that a Metric with a value of 0 doesn't provide a lot of information to the user anyhow, so that is why I'm currently using 0. But if users do want a Metric to have a value of 0, there is the ability to update the Metric using the update method

the init value of "is_updated" should be false.

namannandan · 2022-09-06T21:38:40Z

ts/metrics/metric.py

+        """
+        Reset Metric value to 0 and reset is_updated flag to False
+        """
+        self.value = 0


Same comment as above, would it make sense to use None instead of 0 when value is reset?

namannandan · 2022-09-06T22:03:01Z

ts/model_service_worker.py

@@ -100,6 +110,12 @@ def load_model(load_model_request):
            if "limitMaxImagePixels" in load_model_request:
                limit_max_image_pixels = bool(load_model_request["limitMaxImagePixels"])

+            metrics = MetricsCacheYaml(


Since we are creating an instance of MetricsCacheYaml here which is stored in context at the time of model loading and we also now have a mechanism to reset metrics after we emit them, I believe we should not be creating instances of MetricsStore objects when predict is called in service.py and replacing the MetricsCacheYaml object with the MetricsStore object?
https://github.com/pytorch/serve/pull/1727/files#diff-46d0c43f520da01cd15f2b4784a1545eafb2b2c6a8237088f6d79f641b75e19cL95-R114

metrics = MetricsStore(req_id_map, self.context.model_name) self.context.metrics = metrics

If this is the case then it should be safe to remove metrics_store.py since it is no longer used anywhere?

msaroufim · 2022-10-11T22:46:43Z

docs/metrics.md

+    ```properties
+    ...
+    ...
+    # enable_metrics_api=false


Let's clarify which configs are actually needed for the example

enable_metrics_api=false seems strange

it is not necessary to have parameter enable_metrics_api. The metrics migration should be seamless for users.

msaroufim · 2022-10-11T22:48:22Z

docs/metrics.md

+   If a `metrics_config` argument is not specified, the default yaml file will be used.
+
+
+3. Run torchserve and specify the path of the `config.properties` after the `ts-config` flag:


Instead of HuggingFace let's use the densenet161 example since that's simpler to see working

But even then when I run this command I get this error

2022-10-11T22:41:08,543 [INFO ] W-9000-densenet161_1.0-stdout MODEL_LOG - ts.metrics.metric_cache_errors.MetricsCacheTypeError: File /home/ubuntu/frontend/server/src/test/resources/metrics_default.yaml does not exist. 2022-10-11T22:41:08,544 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED 2022-10-11T22:41:08,544 [DEBUG] W-9000-densenet161_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED 2022-10-11T22:41:08,544 [DEBUG] W-9000-densenet161_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died. java.lang.InterruptedException: null at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) ~[?:?] at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133) ~[?:?] at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) ~[?:?] at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:189) ~[model-server.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) [?:?] 2022-10-11T22:41:08,545 [WARN ] W-9000-densenet161_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: densenet161, error: Worker died. 2022-10-11T22:41:08,545 [DEBUG] W-9000-densenet161_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-densenet161_1.0 State change WORKER_STARTED -> WORKER_STOPPED 2022-10-11T22:41:08,545 [WARN ] W-9000-densenet161_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-densenet161_1.0-stderr 2022-10-11T22:41:08,545 [WARN ] W-9000-densenet161_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-densenet161_1.0-stdout 2022-10-11T22:41:08,546 [INFO ] W-9000-densenet161_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds. 2022-10-11T22:41:08,562 [INFO ] W-9000-densenet161_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-densenet161_1.0-stderr 2022-10-11T22:41:08,562 [INFO ] W-9000-densenet161_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-densenet161_1.0-stdout 2022-10-11T22:41:09,548 [DEBUG] W-9000-densenet161_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/opt/conda/envs/serve/bin/python3.8, /opt/conda/envs/serve/lib/python3.8/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9000, --metrics-config, /home/ubuntu/frontend/server/src/test/resources/metrics_default.yaml] 2022-10-11T22:41:09,583 [INFO ] main org.pytorch.serve.ModelServer - Torchserve stopped. java.nio.file.NoSuchFileException: src/test/resources/key.pem

The only thing that really worked for me was deleting those resources lines and then copying config.properties to my local directory. It might be worth having a config.properties in the root directory

The error "File /home/ubuntu/frontend/server/src/test/resources/metrics_default.yaml does not exist. " means the gradle build integration is missing. The default file path of metrics.yaml should be set in gradle.

msaroufim · 2022-10-11T23:19:16Z

docs/metrics.md

+   (example using [Huggingface_Transformers](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers))
+
+   ```torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --ncs --ts-config ../../frontend/server/src/test/resources/config.properties```
+


should mention that logs are visible in logs/model_metrics.log although when I just downloaded the densenet.mar file we have in our getting started guide I didn't see anything populated there - was hoping to see some default metrics

msaroufim · 2022-10-11T23:22:30Z

docs/metrics.md

 * [Custom Metrics API](#custom-metrics-api)
-* [Logging the custom metrics](#logging-the-custom-metrics)
+* [Log custom metrics](#log-custom-metrics)
+* [Metrics YAML Parsing and Metrics API example](#Metrics-YAML-File-Parsing-and-Metrics-API-Custom-Handler-Example)



Should add a known issues section: for example the first inference is now significantly slower

msaroufim · 2022-10-11T23:36:29Z

docs/metrics.md

+`metric_type=MetricTypes.[counter/gauge/histogram]`.
+
+```python
+metrics.add_metric("GenericMetric", value=1, ..., metric_type=MetricTypes.gauge)


Is add_metric() the right name? This feels more like an add_measurement() to me otherwise why we do we need to setup a yaml file?

msaroufim · 2022-10-11T23:37:27Z

docs/metrics.md

@@ -137,27 +297,33 @@ dimN= Dimension(name_n, value_n)

 ### Add generic metrics

+**Generic metrics are defaulted to a `counter` metric type**
+


This whole section should just refer to the codebase instead

msaroufim · 2022-10-11T23:49:45Z

docs/metrics.md

+from ts.metrics.metric_type_enum import MetricTypes
+
+
+class CustomHandlerExample:


I tried packaging and running this example and still couldn't see any logs printed

java.lang.InterruptedException: null at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) ~[?:?] at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133) ~[?:?] at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) ~[?:?] at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:189) [model-server.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] 2022-10-11T23:48:34,424 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - File "/opt/conda/envs/serve/lib/python3.8/site-packages/ts/model_service_worker.py", line 120, in load_model 2022-10-11T23:48:34,424 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - service = model_loader.load( 2022-10-11T23:48:34,424 [WARN ] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: metrics_model, error: Worker died. 2022-10-11T23:48:34,425 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - File "/opt/conda/envs/serve/lib/python3.8/site-packages/ts/model_loader.py", line 135, in load 2022-10-11T23:48:34,425 [DEBUG] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-metrics_model_1.0 State change WORKER_STARTED -> WORKER_STOPPED 2022-10-11T23:48:34,425 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - entry_point, initialize_fn = self._get_class_entry_point(module) 2022-10-11T23:48:34,425 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - File "/opt/conda/envs/serve/lib/python3.8/site-packages/ts/model_loader.py", line 197, in _get_class_entry_point 2022-10-11T23:48:34,425 [WARN ] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-metrics_model_1.0-stderr 2022-10-11T23:48:34,425 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - raise ValueError( 2022-10-11T23:48:34,425 [WARN ] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-metrics_model_1.0-stdout 2022-10-11T23:48:34,425 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - ValueError: Expect handle method in class <class 'metrics_model.CustomHandlerExample'> 2022-10-11T23:48:34,426 [INFO ] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 13 seconds. 2022-10-11T23:48:34,426 [INFO ] W-9000-metrics_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-metrics_model_1.0-stdout 2022-10-11T23:48:34,443 [INFO ] W-9000-metrics_model_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-metrics_model_1.0-stderr 2022-10-11T23:48:47,426 [DEBUG] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/opt/conda/envs/serve/bin/python3.8, /opt/conda/envs/serve/lib/python3.8/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9000, --metrics-config, /home/ubuntu/serve/frontend/server/src/test/resources/metrics_default.yaml] 2022-10-11T23:48:48,268 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9000 2022-10-11T23:48:48,269 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - [PID]24436 2022-10-11T23:48:48,269 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - Torch worker started. 2022-10-11T23:48:48,269 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - Python runtime: 3.8.13 2022-10-11T23:48:48,269 [DEBUG] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-metrics_model_1.0 State change WORKER_STOPPED -> WORKER_STARTED 2022-10-11T23:48:48,269 [INFO ] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000 2022-10-11T23:48:48,271 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9000. 2022-10-11T23:48:48,271 [INFO ] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1665532128271 2022-10-11T23:48:48,293 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - model_name: metrics_model, batchSize: 1 2022-10-11T23:48:48,299 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - Successfully loaded /home/ubuntu/serve/frontend/server/src/test/resources/metrics_default.yaml. 2022-10-11T23:48:48,299 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - Backend worker process died. 2022-10-11T23:48:48,299 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - Traceback (most recent call last): 2022-10-11T23:48:48,300 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - File "/opt/conda/envs/serve/lib/python3.8/site-packages/ts/model_service_worker.py", line 223, in <module> 2022-10-11T23:48:48,300 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - worker.run_server() 2022-10-11T23:48:48,300 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - File "/opt/conda/envs/serve/lib/python3.8/site-packages/ts/model_service_worker.py", line 191, in run_server 2022-10-11T23:48:48,300 [INFO ] epollEventLoopGroup-5-8 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED 2022-10-11T23:48:48,300 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - self.handle_connection(cl_socket) 2022-10-11T23:48:48,301 [DEBUG] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED 2022-10-11T23:48:48,301 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - File "/opt/conda/envs/serve/lib/python3.8/site-packages/ts/model_service_worker.py", line 156, in handle_connection 2022-10-11T23:48:48,301 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - service, result, code = self.load_model(msg) 2022-10-11T23:48:48,301 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - File "/opt/conda/envs/serve/lib/python3.8/site-packages/ts/model_service_worker.py", line 120, in load_model 2022-10-11T23:48:48,301 [DEBUG] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died. java.lang.InterruptedException: null at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) ~[?:?] at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133) ~[?:?] at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) ~[?:?] at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:189) [model-server.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] 2022-10-11T23:48:48,301 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - service = model_loader.load( 2022-10-11T23:48:48,301 [WARN ] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: metrics_model, error: Worker died. 2022-10-11T23:48:48,302 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - File "/opt/conda/envs/serve/lib/python3.8/site-packages/ts/model_loader.py", line 135, in load 2022-10-11T23:48:48,302 [DEBUG] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-metrics_model_1.0 State change WORKER_STARTED -> WORKER_STOPPED 2022-10-11T23:48:48,302 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - entry_point, initialize_fn = self._get_class_entry_point(module) 2022-10-11T23:48:48,302 [WARN ] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-metrics_model_1.0-stderr 2022-10-11T23:48:48,302 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - File "/opt/conda/envs/serve/lib/python3.8/site-packages/ts/model_loader.py", line 197, in _get_class_entry_point 2022-10-11T23:48:48,302 [WARN ] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-metrics_model_1.0-stdout 2022-10-11T23:48:48,302 [INFO ] W-9000-metrics_model_1.0-stdout MODEL_LOG - raise ValueError( 2022-10-11T23:48:48,302 [INFO ] W-9000-metrics_model_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 21 seconds. 2022-10-11T23:48:48,303 [INFO ] W-9000-metrics_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-metrics_model_1.0-stdout 2022-10-11T23:48:48,320 [INFO ] W-9000-metrics_model_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-metrics_model_1.0-stderr

According to the log "Expect handle method in class <class 'metrics_model.CustomHandlerExample'>", it seems that handle method is missing in the example.

msaroufim · 2022-10-11T23:49:59Z

docs/metrics.md

+        # Emitting the metrics that have been updated to the frontend and
+        # then resetting the Metrics' values afterwards
+        emit_metrics(metrics.cache)
+```


Doc is missing a section on prometheus export

lxning · 2022-10-12T16:39:38Z

docs/metrics.md

+    ```properties
+    ...
+    ...
+    # enable_metrics_api=false


it is not necessary to have parameter enable_metrics_api. The metrics migration should be seamless for users.

lxning · 2022-10-12T16:49:56Z

docs/metrics.md

+   If a `metrics_config` argument is not specified, the default yaml file will be used.
+
+
+3. Run torchserve and specify the path of the `config.properties` after the `ts-config` flag:


The error "File /home/ubuntu/frontend/server/src/test/resources/metrics_default.yaml does not exist. " means the gradle build integration is missing. The default file path of metrics.yaml should be set in gradle.

lxning · 2022-10-12T17:27:51Z

docs/metrics.md

+```
+
+
+### Updating Metrics parsed from the yaml file


Why customer needs to change a metrics defined in metrics.yaml?

docs/metrics.md

ts/model_service_worker.py

ts/metrics/metric_cache_abstract.py

ts/metrics/metric.py

maaquib · 2022-11-09T17:56:49Z

Closing in favour of #1954

An added 5 commits July 5, 2022 16:38

Created metrics cache class

89d8baf

added unit tests, cleaned up naming

6485b01

moved unit testing files

2a82a6b

fixing conditional logic in add_metric function, filled out name fiel…

1875ca5

…d in yaml files

Changed dimensions to be list of Dimension objects instead of list of…

1b0d91c

… str

msaroufim reviewed Jul 11, 2022

View reviewed changes

ts/tests/metrics_yaml_testing/metric_cache_unit_test.py Outdated Show resolved Hide resolved

msaroufim reviewed Jul 11, 2022

View reviewed changes

ts/metrics/metric_cache.py Outdated Show resolved Hide resolved

An added 6 commits July 11, 2022 13:50

moved to pytest for unit testing

d1a0755

abstracted metric cache class and added pytest unit tests with cleare…

7d3ac3f

…ra ssertions

Adding rough metrics flush method

2fdacef

Converting prints to loggers and emitting metrics to logs

ac89405

removed system metrics, creating custom handler

460e22a

reassigning the metric name and passing in yaml file as an argument

5758f43

maaquib assigned joshuaan7 Jul 21, 2022

maaquib requested a review from lxning July 21, 2022 21:45

maaquib added the enhancement New feature or request label Jul 21, 2022

An added 7 commits July 25, 2022 16:22

Adding log lines and setting up MetricsCache obj integration

4d3c0d0

Adding more unit tests, fixing dimensions parsing

419ba96

added custom error class to act as wrapper for MetricsCache objects

d7ffd48

Added more unit tests for catching naming Metric strings

2efd5e5

Added more comments to code

2bd8c66

working in torchserve start cmd

01d3af9

getting rid of abs path method that is not in use

4abd315

rohithkrn self-requested a review August 3, 2022 17:45

An added 6 commits August 5, 2022 11:05

migrating additional add_metric methods from store to cache

8340230

Creating custom handler to test migrated methods and beginnings of te…

b416bc7

…sting for passing file path through properties

editing custom handler

45d3fac

fixing custom handler

6ced1ad

adding flags to reset Metrics after being emitted, trying custom handler

994af49

revising custom handler and passing yaml file as arg

66e5162

joshuaan7 marked this pull request as ready for review August 24, 2022 23:23

An and others added 3 commits August 26, 2022 12:23

linting

a781ed7

Merge branch 'master' of https://github.com/joshuaan7/serve

0c58bd8

Merge branch 'master' into master

407ab2e

maaquib requested a review from HamidShojanazeri August 30, 2022 16:46

maaquib requested changes Aug 30, 2022

View reviewed changes

frontend/server/src/main/java/org/pytorch/serve/wlm/WorkerLifeCycle.java Outdated Show resolved Hide resolved

ts/arg_parser.py Outdated Show resolved Hide resolved

ts/metrics/metric.py Outdated Show resolved Hide resolved

ts/metrics/metric_cache_yaml.py Outdated Show resolved Hide resolved

maaquib and others added 10 commits August 31, 2022 09:46

Merge branch 'master' into master

cc50528

adding docs

6a27475

Adding support for default yaml file config arg

bd2ae6e

renaming metrics log to metrics config

48d8891

Merge branch 'master' into master

2d55919

linting

3aa4fb2

commenting out metrics_config in config properties

6e35ae3

reformatting

cfd8aaa

linting

58f0130

Reformatting configsmanager file

16e96f1

msaroufim requested changes Sep 6, 2022

View reviewed changes

examples/Huggingface_Transformers/Transformer_handler_generalized.py Outdated Show resolved Hide resolved

joshuaan7 added 2 commits September 6, 2022 11:38

fixing initial round of comments

d675ebe

passing lint

dc83f90

namannandan requested changes Sep 6, 2022

View reviewed changes

maaquib and others added 3 commits September 15, 2022 16:22

Merge branch 'master' into master

021d3f1

Merge branch 'master' into master

7b42779

Merge branch 'master' into master

13a6ce3

msaroufim requested changes Oct 11, 2022

View reviewed changes

lxning reviewed Oct 13, 2022

View reviewed changes

maaquib mentioned this pull request Nov 8, 2022

Caching Metrics implementation #1954

Merged

8 tasks

maaquib closed this Nov 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Metrics Refactoring #1492 Draft PR #1727

[RFC]: Metrics Refactoring #1492 Draft PR #1727

joshuaan7 commented Jul 7, 2022 •

edited

Loading

joshuaan7 commented Aug 24, 2022

msaroufim left a comment

msaroufim Sep 6, 2022

msaroufim Sep 6, 2022

msaroufim Sep 6, 2022

joshuaan7 Sep 6, 2022

msaroufim Sep 6, 2022

msaroufim left a comment

namannandan Sep 6, 2022 •

edited

Loading

joshuaan7 Sep 7, 2022

lxning Oct 12, 2022

namannandan Sep 6, 2022

namannandan Sep 6, 2022

msaroufim Oct 11, 2022

lxning Oct 12, 2022

msaroufim Oct 11, 2022

lxning Oct 12, 2022

msaroufim Oct 11, 2022

msaroufim Oct 11, 2022

msaroufim Oct 11, 2022

msaroufim Oct 11, 2022

msaroufim Oct 11, 2022

lxning Oct 12, 2022

msaroufim Oct 11, 2022

lxning Oct 12, 2022

lxning Oct 12, 2022

lxning Oct 12, 2022

maaquib commented Nov 9, 2022


		metrics.add_size("GaugeModelMetricNameExample", 42.5) # adding gauge metric

		emit_metrics(metrics.cache)



		class MetricCacheAbstract(metaclass=abc.ABCMeta):
		def __init__(self, request_ids, model_name, file):

		If a `metrics_config` argument is not specified, the default yaml file will be used.


		3. Run torchserve and specify the path of the `config.properties` after the `ts-config` flag:

		(example using [Huggingface_Transformers](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers))

		```torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --ncs --ts-config ../../frontend/server/src/test/resources/config.properties```

		@@ -137,27 +297,33 @@ dimN= Dimension(name_n, value_n)

		### Add generic metrics

		Generic metrics are defaulted to a `counter` metric type

		from ts.metrics.metric_type_enum import MetricTypes


		class CustomHandlerExample:

[RFC]: Metrics Refactoring #1492 Draft PR #1727

[RFC]: Metrics Refactoring #1492 Draft PR #1727

Conversation

joshuaan7 commented Jul 7, 2022 • edited Loading

Type of change

Feature/Issue validation/testing

Checklist:

joshuaan7 commented Aug 24, 2022

msaroufim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msaroufim left a comment

Choose a reason for hiding this comment

namannandan Sep 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maaquib commented Nov 9, 2022

joshuaan7 commented Jul 7, 2022 •

edited

Loading

namannandan Sep 6, 2022 •

edited

Loading