- Introduction
- System metrics
- Formatting
- Metric Types
- Central metrics yaml file definition
- Custom Metrics API
- Logging custom metrics
- Metrics YAML Parsing and Metrics API example
- Backwards compatibility warnings and upgrade guide
Torchserve metrics can be broadly classified into frontend and backend metrics.
Frontend metrics include system level metrics. The host resource utilization frontend metrics are collected at regular intervals(default: every minute).
Torchserve provides an API to collect custom backend metrics. Metrics defined by a custom service or handler code can be collected per request or per a batch of requests.
Two metric modes are supported, i.e log
and prometheus
. The default mode is log
.
Metrics mode can be configured using the metrics_mode
configuration option in config.properties
or TS_METRICS_MODE
environment variable.
For further details on config.properties
and environment variable based configuration, refer Torchserve config docs.
In log
mode, Metrics are logged and can be aggregated by metric agents.
Metrics are collected by default at the following locations in log
mode:
- Frontend metrics -
log_directory/ts_metrics.log
- Backend metrics -
log directory/model_metrics.log
The location of log files and metric files can be configured in the log4j2.xml file
In prometheus
mode, all metrics are made available in prometheus format via the metrics API endpoint.
Metric Name | Type | Unit | Dimensions | Semantics |
---|---|---|---|---|
Requests2XX | counter | Count | Level, Hostname | Total number of requests with response in 200-300 status code range |
Requests4XX | counter | Count | Level, Hostname | Total number of requests with response in 400-500 status code range |
Requests5XX | counter | Count | Level, Hostname | Total number of requests with response status code above 500 |
ts_inference_requests_total | counter | Count | model_name, model_version, hostname | Total number of inference requests received |
ts_inference_latency_microseconds | counter | Microseconds | model_name, model_version, hostname | Total inference latency in Microseconds |
ts_queue_latency_microseconds | counter | Microseconds | model_name, model_version, hostname | Total queue latency in Microseconds |
QueueTime | gauge | Milliseconds | Level, Hostname | Time spent by a job in request queue in Milliseconds |
WorkerThreadTime | gauge | Milliseconds | Level, Hostname | Time spent in worker thread excluding backend response time in Milliseconds |
WorkerLoadTime | gauge | Milliseconds | WorkerName, Level, Hostname | Time taken by worker to load model in Milliseconds |
CPUUtilization | gauge | Percent | Level, Hostname | CPU utilization on host |
MemoryUsed | gauge | Megabytes | Level, Hostname | Memory used on host |
MemoryAvailable | gauge | Megabytes | Level, Hostname | Memory available on host |
MemoryUtilization | gauge | Percent | Level, Hostname | Memory utilization on host |
DiskUsage | gauge | Gigabytes | Level, Hostname | Disk used on host |
DiskUtilization | gauge | Percent | Level, Hostname | Disk used on host |
DiskAvailable | gauge | Gigabytes | Level, Hostname | Disk available on host |
GPUMemoryUtilization | gauge | Percent | Level, DeviceId, Hostname | GPU memory utilization on host, DeviceId |
GPUMemoryUsed | gauge | Megabytes | Level, DeviceId, Hostname | GPU memory used on host, DeviceId |
GPUUtilization | gauge | Percent | Level, DeviceId, Hostname | GPU utilization on host, DeviceId |
Metric Name | Type | Unit | Dimensions | Semantics |
---|---|---|---|---|
HandlerTime | gauge | ms | ModelName, Level, Hostname | Time spent in backend handler |
PredictionTime | gauge | ms | ModelName, Level, Hostname | Backend prediction time |
TorchServe emits metrics to log files by default. The metrics are formatted in a StatsD like format.
CPUUtilization.Percent:0.0|#Level:Host|#hostname:my_machine_name,timestamp:1682098185
DiskAvailable.Gigabytes:318.0416717529297|#Level:Host|#hostname:my_machine_name,timestamp:1682098185
To enable metric logging in JSON format, set "patternlayout" as "JSONPatternLayout" in log4j2.xml (See sample log4j2-json.xml). For information, see Logging in Torchserve.
After you enable JSON log formatting, logs will look as follows:
{
"MetricName": "DiskAvailable",
"Value": "108.15547180175781",
"Unit": "Gigabytes",
"Dimensions": [
{
"Name": "Level",
"Value": "Host"
}
],
"HostName": "my_machine_name"
}
{
"MetricName": "DiskUsage",
"Value": "124.13163757324219",
"Unit": "Gigabytes",
"Dimensions": [
{
"Name": "Level",
"Value": "Host"
}
],
"HostName": "my_machine_name"
}
To enable metric logging in QLog format, set "patternlayout" as "QLogLayout" in log4j2.xml (See sample log4j2-qlog.xml). For information, see Logging in Torchserve.
After you enable QLogsetupModelDependencies formatting, logs will look as follows:
HostName=abc.com
StartTime=1646686978
Program=MXNetModelServer
Metrics=MemoryUsed=5790.98046875 Megabytes Level|Host
EOE
HostName=147dda19895c.ant.amazon.com
StartTime=1646686978
Program=MXNetModelServer
Metrics=MemoryUtilization=46.2 Percent Level|Host
EOE
TorchServe Metrics is introducing Metric Types that are in line with the Prometheus API metric types.
Metric types are an attribute of Metric objects. Users will be restricted to the existing metric types when adding metrics via Metrics API.
class MetricTypes(enum.Enum):
COUNTER = "counter"
GAUGE = "gauge"
HISTOGRAM = "histogram"
TorchServe defines metrics in a yaml
file, including both frontend metrics (i.e. ts_metrics
) and backend metrics (i.e. model_metrics
).
When TorchServe is started, the metrics definition is loaded in the frontend and backend cache separately.
The backend flushes the metrics cache once a load model or inference request is completed.
Dynamic updates between the frontend and backend are not currently being handled.
The metrics.yaml
is formatted with Prometheus metric type terminology:
dimensions: # dimension aliases
- &model_name "ModelName"
- &level "Level"
ts_metrics: # frontend metrics
counter: # metric type
- name: NameOfCounterMetric # name of metric
unit: ms # unit of metric
dimensions: [*model_name, *level] # dimension names of metric (referenced from the above dimensions dict)
gauge:
- name: NameOfGaugeMetric
unit: ms
dimensions: [*model_name, *level]
histogram:
- name: NameOfHistogramMetric
unit: ms
dimensions: [*model_name, *level]
model_metrics: # backend metrics
counter: # metric type
- name: InferenceTimeInMS # name of metric
unit: ms # unit of metric
dimensions: [*model_name, *level] # dimension names of metric (referenced from the above dimensions dict)
- name: NumberOfMetrics
unit: count
dimensions: [*model_name]
gauge:
- name: GaugeModelMetricNameExample
unit: ms
dimensions: [*model_name, *level]
histogram:
- name: HistogramModelMetricNameExample
unit: ms
dimensions: [*model_name, *level]
Note that only the metrics defined in the metrics configuration file can be emitted to logs or made available via the metrics API endpoint. This is done to ensure that the metrics configuration file serves as a central inventory of all the metrics that Torchserve can emit.
Default metrics are provided in the metrics.yaml file, but the user can either delete them to their liking / ignore them altogether, because these metrics will not be emitted unless they are edited.
When adding custom model_metrics
in the metrics configuration file, ensure to include ModelName
and Level
dimension names towards the end of the list of dimensions since they are included by default by the following custom metrics APIs:
add_metric, add_counter,
add_time, add_size or add_percent.
Whenever torchserve starts, the backend worker initializes service.context.metrics
with the MetricsCache object. The model_metrics
(backend metrics) section within the specified yaml file will be parsed, and Metric objects will be created based on the parsed section and added to the cache.
This is all done internally, so the user does not have to do anything other than specifying the desired yaml file.
Users have the ability to parse other sections of the yaml file manually, but the primary purpose of this functionality is to parse the backend metrics from the yaml file.
-
Create a
metrics.yaml
file to parse metrics from OR utilize default metrics.yaml -
Set
metrics_config
argument equal to the yaml file path in theconfig.properties
being used:... ... workflow_store=../archive/src/test/resources/workflows metrics_config=/<path>/<to>/<metrics>/<file>/metrics.yaml ... ...
If a
metrics_config
argument is not specified, the default yaml file will be used. -
Run torchserve and specify the path of the
config.properties
after thets-config
flag: (example using Huggingface_Transformers)torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --ncs --ts-config /<path>/<to>/<config>/<file>/config.properties
TorchServe enables the custom service code to emit metrics that are then made available based on the configured metrics_mode
.
The custom service code is provided with a context of the current request with a metrics object:
# Access context metrics as follows
metrics = context.metrics
All metrics are collected within the context.
When adding any metric via Metrics API, users have the ability to override the metric type by specifying the positional argument
metric_type=MetricTypes.[COUNTER/GAUGE/HISTOGRAM]
.
metric1 = metrics.add_metric_to_cache("GenericMetric", unit=unit, dimension_names=["name1", "name2", ...], metric_type=MetricTypes.GAUGE)
metric.add_or_update(value, dimension_values=["value1", "value2", ...])
# Backwards compatible, combines the above two method calls
metrics.add_counter("CounterMetric", value=1, dimensions=[Dimension("name", "value"), ...])
Given the Metrics API, users will also be able to update metrics that have been parsed from the yaml file given some criteria:
(we will use the following metric as an example)
counter: # metric type
- name: InferenceTimeInMS # name of metric
unit: ms # unit of metric
dimensions: [ModelName, Level]
-
Metric Type has to be the same
- The user will have to use a counter-based
add_...
method, or explicitly setmetric_type=MetricTypes.counter
within theadd_...
method
- The user will have to use a counter-based
-
Metric Name has to be the same
- If the name of the metric in the YAML file you want to update is
InferenceTimeInMS
, thenadd_metric(name="InferenceTimeInMS", ...)
- If the name of the metric in the YAML file you want to update is
-
Dimensions should be the same (as well as the same order!)
- All dimensions have to match, and Metric objects that have been parsed from the yaml file also have dimension names that are parsed from the yaml file
- Users can create their own
Dimension
objects to match those in the yaml file dimensions - if the Metric object has
ModelName
andLevel
dimensions only, it is optional to specify additional dimensions since these are considered default dimensions, so:add_counter('InferenceTimeInMS', value=2)
oradd_counter('InferenceTimeInMS', value=2, dimensions=["ModelName", "Level"])
- Users can create their own
- All dimensions have to match, and Metric objects that have been parsed from the yaml file also have dimension names that are parsed from the yaml file
Metrics will have a couple of default dimensions if not already specified.
If the metric is a type Gauge
, Histogram
, Counter
, by default it will have:
ModelName,{name_of_model}
Level,Model
Dimensions for metrics can be defined as objects
from ts.metrics.dimension import Dimension
# Dimensions are name value pairs
dim1 = Dimension(name, value)
dim2 = Dimension(some_name, some_value)
.
.
.
dimN= Dimension(name_n, value_n)
NOTE: Metric functions below accept a list of dimensions
Generic metrics are defaulted to a COUNTER
metric type
One can add metrics with generic units using the following function.
def add_metric_to_cache(
self,
metric_name: str,
unit: str,
dimension_names: list = [],
metric_type: MetricTypes = MetricTypes.COUNTER,
) -> CachingMetric:
"""
Create a new metric and add into cache. Override existing metric if already present.
Parameters
----------
metric_name str
Name of metric
unit str
unit can be one of ms, percent, count, MB, GB or a generic string
dimension_names list
list of dimension name strings for the metric
metric_type MetricTypes
Type of metric Counter, Gauge, Histogram
Returns
-------
newly created Metrics object
"""
def add_or_update(
self,
value: int or float,
dimension_values: list = [],
request_id: str = "",
):
"""
Update metric value, request id and dimensions
Parameters
----------
value : int, float
metric to be updated
dimension_values : list
list of dimension values
request_id : str
request id to be associated with the metric
"""
# Add Distance as a metric
# dimensions = [dim1, dim2, dim3, ..., dimN]
# Assuming batch size is 1 for example
metric = metrics.add_metric_to_cache('DistanceInKM', unit='km', dimension_names=[...])
metric.add_or_update(distance, dimension_values=[...])
Note that calling add_metric_to_cache
will not emit the metric, add_or_update
will need to be called on the metric object as shown above.
def add_metric(
self,
name: str,
value: int or float,
unit: str,
idx: str = None,
dimensions: list = [],
metric_type: MetricTypes = MetricTypes.COUNTER,
):
"""
Add a generic metric
Default metric type is counter
Parameters
----------
name : str
metric name
value: int or float
value of the metric
unit: str
unit of metric
idx: str
request id to be associated with the metric
dimensions: list
list of Dimension objects for the metric
metric_type MetricTypes
Type of metric Counter, Gauge, Histogram
"""
# Add Distance as a metric
# dimensions = [dim1, dim2, dim3, ..., dimN]
metric = metrics.add_metric('DistanceInKM', value=10, unit='km', dimensions=[...])
Time-based metrics are defaulted to a GAUGE
metric type
Add time-based by invoking the following method:
Function API
def add_time(self, name: str, value: int or float, idx=None, unit: str = 'ms', dimensions: list = None,
metric_type: MetricTypes = MetricTypes.GAUGE):
"""
Add a time based metric like latency, default unit is 'ms'
Default metric type is gauge
Parameters
----------
name : str
metric name
value: int
value of metric
idx: int
request_id index in batch
unit: str
unit of metric, default here is ms, s is also accepted
dimensions: list
list of dimensions for the metric
metric_type: MetricTypes
type for defining different operations, defaulted to gauge metric type for Time metrics
"""
Note that the default unit in this case is 'ms'
Supported units: ['ms', 's']
To add custom time-based metrics:
# Add inference time
# dimensions = [dim1, dim2, dim3, ..., dimN]
# Assuming batch size is 1 for example
metrics.add_time('InferenceTime', end_time-start_time, None, 'ms', dimensions)
Size-based metrics are defaulted to a GAUGE
metric type
Add size-based metrics by invoking the following method:
Function API
def add_size(self, name: str, value: int or float, idx=None, unit: str = 'MB', dimensions: list = None,
metric_type: MetricTypes = MetricTypes.GAUGE):
"""
Add a size based metric
Default metric type is gauge
Parameters
----------
name : str
metric name
value: int, float
value of metric
idx: int
request_id index in batch
unit: str
unit of metric, default here is 'MB', 'kB', 'GB' also supported
dimensions: list
list of dimensions for the metric
metric_type: MetricTypes
type for defining different operations, defaulted to gauge metric type for Size metrics
"""
Note that the default unit in this case is milliseconds (ms).
Supported units: ['MB', 'kB', 'GB', 'B']
To add custom size based metrics
# Add Image size as a metric
# dimensions = [dim1, dim2, dim3, ..., dimN]
# Assuming batch size 1
metrics.add_size('SizeOfImage', img_size, None, 'MB', dimensions)
Percentage-based metrics are defaulted to a GAUGE
metric type
Percentage based metrics can be added by invoking the following method:
Function API
def add_percent(self, name: str, value: int or float, idx=None, dimensions: list = None,
metric_type: MetricTypes = MetricTypes.GAUGE):
"""
Add a percentage based metric
Default metric type is gauge
Parameters
----------
name : str
metric name
value: int, float
value of metric
idx: int
request_id index in batch
dimensions: list
list of dimensions for the metric
metric_type: MetricTypes
type for defining different operations, defaulted to gauge metric type for Percent metrics
"""
Inferred unit: percent
To add custom percentage-based metrics:
# Add MemoryUtilization as a metric
# dimensions = [dim1, dim2, dim3, ..., dimN]
# Assuming batch size 1
metrics.add_percent('MemoryUtilization', utilization_percent, None, dimensions)
Counter-based metrics are defaulted to a COUNTER
metric type
Counter based metrics can be added by invoking the following method
Function API
def add_counter(self, name: str, value: int or float, idx=None, dimensions: list = None):
"""
Add a counter metric or increment an existing counter metric
Default metric type is counter
Parameters
----------
name : str
metric name
value: int or float
value of metric
idx: int
request_id index in batch
dimensions: list
list of dimensions for the metric
"""
Inferred unit: count
Users can get a metric from the cache. The Metric object is returned, so the user can access the methods of the Metric: (i.e. Metric.update(value)
, Metric.__str__
)
def get_metric(self, metric_name: str, metric_type: MetricTypes) -> Metric:
"""
Get a Metric from cache.
Ask user for required requirements to form metric key to retrieve Metric.
Parameters
----------
metric_type: MetricTypes
Type of metric: use MetricTypes enum to specify
metric_name: str
Name of metric
"""
i.e.
# Method 1: Getting metric of metric name string, MetricType COUNTER
metrics.get_metric("MetricName", MetricTypes.COUNTER)
# Method 2: Getting metric of metric name string, MetricType GAUGE
metrics.get_metric("GaugeMetricName", MetricTypes.GAUGE)
Following sample code can be used to log the custom metrics created in the model's custom handler:
# In Custom Handler
from ts.service import emit_metrics
class ExampleCustomHandler(BaseHandler, ABC):
def initialize(self, ctx):
context.metrics.add_counter(...)
This custom metrics information is logged in the model_metrics.log
file configured through log4j2.xml file
or made available via the metrics API endpoint based on the metrics_mode
configuration.
This example utilizes the feature of parsing metrics from a YAML file, adding and updating metrics and their values via Metrics API, updating metrics that have been parsed from the YAML file via Metrics API, and finally emitting all metrics that have been updated.
from ts.service import emit_metrics
from ts.metrics.metric_type_enum import MetricTypes
class CustomHandlerExample:
def initialize(self, ctx):
metrics = ctx.metrics # initializing metrics to the context.metrics
# Setting a sleep for examples' sake
start_time = time.time()
time.sleep(3)
stop_time = time.time()
# Adds a metric that has a metric type of gauge
metrics.add_time(
"HandlerTime", round((stop_time - start_time) * 1000, 2), None, "ms"
)
# Logs the value 2.5 and -1.3 to the frontend
metrics.add_counter("HandlerSeparateCounter", 2.5)
metrics.add_counter("HandlerSeparateCounter", -1.3)
# Adding a standard counter metric
metrics.add_counter("HandlerCounter", 21.3)
# Assume that a metric that has a metric type of counter
# and is named InferenceTimeInMS in the metrics.yaml file.
# Instead of creating a new object with the same name and same parameters,
# this line will update the metric that already exists from the YAML file.
metrics.add_counter("InferenceTimeInMS", 2.78)
# Another method of updating values -
# using the get_metric + Metric.update method
# In this example, we are getting an already existing
# Metric that had been parsed from the yaml file
histogram_example_metric = metrics.get_metric(
"HistogramModelMetricNameExample",
MetricTypes.histogram,
)
histogram_example_metric.add_or_update(4.6)
# Same idea as the 'metrics.add_counter('InferenceTimeInMS', 2.78)' line,
# except this time with gauge metric type object
metrics.add_size("GaugeModelMetricNameExample", 42.5)
- Starting v0.6.1, the
add_metric
API signature changed
from: add_metric(name, value, unit, idx=None, dimensions=None)
to: add_metric(metric_name, unit, dimension_names=None, metric_type=MetricTypes.COUNTER).
In versions greater than v0.8.1 theadd_metric
API signature was updated to support backwards compatibility:
from: add_metric(metric_name, unit, dimension_names=None, metric_type=MetricTypes.COUNTER)
to:add_metric(name, value, unit, idx=None, dimensions=[], metric_type=MetricTypes.COUNTER)
Usage of the new API is shown above.
Upgrade paths:- [< v0.6.1] to [v0.6.1 - v0.8.1]
There are two approaches available when migrating to the new custom metrics API:- Replace the call to
add_metric
with calls to the following methods:metric1 = metrics.add_metric("GenericMetric", unit=unit, dimension_names=["name1", "name2", ...], metric_type=MetricTypes.GAUGE) metric1.add_or_update(value, dimension_values=["value1", "value2", ...])
- Replace the call to
add_metric
in versions prior to v0.6.1 with one of the suitable custom metrics APIs where applicable: add_counter, add_time, add_size or add_percent
- Replace the call to
- [< v0.6.1] to [> v0.8.1]
The call toadd_metric
is backwards compatible but the metric type is inferred to beCOUNTER
. If the metric is of a different type, an additional argumentmetric_type
will need to be provided to theadd_metric
call shown belowmetrics.add_metric(name='GenericMetric', value=10, unit='count', dimensions=[...], metric_type=MetricTypes.GAUGE)
- [v0.6.1 - v0.8.1] to [> v0.8.1]
Replace the call toadd_metric
withadd_metric_to_cache
.
- [< v0.6.1] to [v0.6.1 - v0.8.1]
- Starting v0.8.0, only metrics that are defined in the metrics config file(default: metrics.yaml)
are either all logged to
ts_metrics.log
andmodel_metrics.log
or made available via the metrics API endpoint based on themetrics_mode
configuration as described above.
The defaultmetrics_mode
islog
mode.
This is unlike in previous versions where all metrics were only logged tots_metrics.log
andmodel_metrics.log
except forts_inference_requests_total
,ts_inference_latency_microseconds
andts_queue_latency_microseconds
which were only available via the metrics API endpoint.
Upgrade paths:- [< v0.8.0] to [>= v0.8.0]
Specify all the custom metrics added to the custom handler in the metrics configuration file as shown above.
- [< v0.8.0] to [>= v0.8.0]