Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]: Metrics Refactoring #1492 Draft PR #1727

Closed
wants to merge 55 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
89d8baf
Created metrics cache class
Jul 5, 2022
6485b01
added unit tests, cleaned up naming
Jul 7, 2022
2a82a6b
moved unit testing files
Jul 7, 2022
1875ca5
fixing conditional logic in add_metric function, filled out name fiel…
Jul 7, 2022
1b0d91c
Changed dimensions to be list of Dimension objects instead of list of…
Jul 8, 2022
d1a0755
moved to pytest for unit testing
Jul 11, 2022
7d3ac3f
abstracted metric cache class and added pytest unit tests with cleare…
Jul 11, 2022
2fdacef
Adding rough metrics flush method
Jul 15, 2022
ac89405
Converting prints to loggers and emitting metrics to logs
Jul 18, 2022
460e22a
removed system metrics, creating custom handler
Jul 19, 2022
5758f43
reassigning the metric name and passing in yaml file as an argument
Jul 20, 2022
4d3c0d0
Adding log lines and setting up MetricsCache obj integration
Jul 25, 2022
419ba96
Adding more unit tests, fixing dimensions parsing
Jul 27, 2022
d7ffd48
added custom error class to act as wrapper for MetricsCache objects
Jul 28, 2022
2efd5e5
Added more unit tests for catching naming Metric strings
Jul 29, 2022
2bd8c66
Added more comments to code
Aug 1, 2022
01d3af9
working in torchserve start cmd
Aug 2, 2022
4abd315
getting rid of abs path method that is not in use
Aug 2, 2022
8340230
migrating additional add_metric methods from store to cache
Aug 5, 2022
b416bc7
Creating custom handler to test migrated methods and beginnings of te…
Aug 15, 2022
45d3fac
editing custom handler
Aug 15, 2022
6ced1ad
fixing custom handler
Aug 16, 2022
994af49
adding flags to reset Metrics after being emitted, trying custom handler
Aug 18, 2022
66e5162
revising custom handler and passing yaml file as arg
Aug 19, 2022
f299514
refactoring metrics log var name
Aug 19, 2022
5b69469
getting rid of unneeded log lines
Aug 19, 2022
1e585a2
editing default metric log path
Aug 19, 2022
a918be0
adding req ids to Metric objs parsed by yaml file
Aug 20, 2022
41cb423
editing custom handler
joshuaan7 Aug 22, 2022
e3a6fd3
Merge branch 'master' of https://github.com/joshuaan7/serve
joshuaan7 Aug 22, 2022
d11fb60
clearing log lines, commenting out unneeded sections
Aug 22, 2022
dd8542f
Merge branch 'master' of https://github.com/joshuaan7/serve
Aug 22, 2022
6331ce8
adding proper getter method for metrics log
joshuaan7 Aug 23, 2022
c95c422
Changes to get_metric functionality
Aug 24, 2022
da635c8
Merge branch 'master' of https://github.com/joshuaan7/serve
Aug 24, 2022
85ad4c8
typo
Aug 24, 2022
6314067
Merge branch 'master' into master
joshuaan7 Aug 24, 2022
a781ed7
linting
Aug 26, 2022
0c58bd8
Merge branch 'master' of https://github.com/joshuaan7/serve
Aug 26, 2022
407ab2e
Merge branch 'master' into master
joshuaan7 Aug 26, 2022
cc50528
Merge branch 'master' into master
maaquib Aug 31, 2022
6a27475
adding docs
Aug 31, 2022
bd2ae6e
Adding support for default yaml file config arg
Aug 31, 2022
48d8891
renaming metrics log to metrics config
Aug 31, 2022
2d55919
Merge branch 'master' into master
maaquib Sep 1, 2022
3aa4fb2
linting
Sep 1, 2022
6e35ae3
commenting out metrics_config in config properties
Sep 1, 2022
cfd8aaa
reformatting
Sep 1, 2022
58f0130
linting
Sep 1, 2022
16e96f1
Reformatting configsmanager file
Sep 2, 2022
d675ebe
fixing initial round of comments
joshuaan7 Sep 6, 2022
dc83f90
passing lint
joshuaan7 Sep 6, 2022
021d3f1
Merge branch 'master' into master
maaquib Sep 15, 2022
7b42779
Merge branch 'master' into master
maaquib Oct 4, 2022
13a6ce3
Merge branch 'master' into master
msaroufim Oct 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
338 changes: 316 additions & 22 deletions docs/metrics.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,21 +1,22 @@
from abc import ABC
import ast
import json
import logging
import os
import ast
from abc import ABC

import torch
import transformers
from captum.attr import LayerIntegratedGradients
from transformers import (
AutoModelForSequenceClassification,
AutoTokenizer,
AutoModelForCausalLM,
AutoModelForQuestionAnswering,
AutoModelForSequenceClassification,
AutoModelForTokenClassification,
AutoModelForCausalLM,
AutoTokenizer,
GPT2TokenizerFast,
)
from transformers import GPT2TokenizerFast

from ts.torch_handler.base_handler import BaseHandler
from captum.attr import LayerIntegratedGradients

logger = logging.getLogger(__name__)
logger.info("Transformers version %s", transformers.__version__)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import io.netty.handler.ssl.SslContext;
import io.netty.handler.ssl.SslContextBuilder;
import io.netty.handler.ssl.util.SelfSignedCertificate;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
Expand Down Expand Up @@ -39,6 +40,7 @@
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

import org.apache.commons.cli.CommandLine;
import org.apache.commons.cli.Option;
import org.apache.commons.cli.Options;
Expand All @@ -65,6 +67,7 @@ public final class ConfigManager {
private static final String TS_NETTY_CLIENT_THREADS = "netty_client_threads";
private static final String TS_JOB_QUEUE_SIZE = "job_queue_size";
private static final String TS_NUMBER_OF_GPU = "number_of_gpu";
private static final String TS_METRICS_CONFIG = "metrics_config";

// IPEX config option that can be set at config.properties
private static final String TS_IPEX_ENABLE = "ipex_enable";
Expand Down Expand Up @@ -363,6 +366,40 @@ public int getNumberOfGpu() {
return getIntProperty(TS_NUMBER_OF_GPU, 0);
}

public String getMetricsConfig() {
// Getting absolute path of working directory, and finding the metrics_default yaml.
maaquib marked this conversation as resolved.
Show resolved Hide resolved
// Assumptions are that users are running torchserve within the serve directory, and that the name of the
// serve directory is still 'serve' (i.e. users did not rename serve)

String basePath = System.getProperty("user.dir");
String yamlModulePath = "/frontend/server/src/test/resources/metrics_default.yaml";

maaquib marked this conversation as resolved.
Show resolved Hide resolved
String[] basePathList = basePath.split("/");
List<String> serveBasePath = new ArrayList<String>();
int serveIndex = basePathList.length;

for (int i = 0; i < basePathList.length; i++) {
String directory = basePathList[i];
if ("serve".equals(directory)) {
serveIndex = i;
}
}

if (serveIndex == basePathList.length) {
serveIndex--;
}

for (int j = 0; j < serveIndex + 1; j++) {
String directory = basePathList[j];
serveBasePath.add(directory);
}

String serveBasePathString = String.join("/", serveBasePath);
String yamlAbsolutePath = serveBasePathString + yamlModulePath;

return getProperty(TS_METRICS_CONFIG, yamlAbsolutePath);
}

public String getTsDefaultServiceHandler() {
return getProperty(TS_DEFAULT_SERVICE_HANDLER, null);
}
Expand Down Expand Up @@ -510,11 +547,11 @@ public SslContext getSslContext() throws IOException, GeneralSecurityException {
} else {
SelfSignedCertificate ssc = new SelfSignedCertificate();
privateKey = ssc.key();
chain = new X509Certificate[] {ssc.cert()};
chain = new X509Certificate[]{ssc.cert()};
}

return SslContextBuilder.forServer(privateKey, chain)
.protocols(new String[] {"TLSv1.2"})
.protocols(new String[]{"TLSv1.2"})
.ciphers(supportedCiphers)
.build();
}
Expand Down Expand Up @@ -793,7 +830,8 @@ public void setInitialWorkerPort(int initialPort) {

private void setModelConfig() {
String modelConfigStr = prop.getProperty(MODEL_CONFIG, null);
Type type = new TypeToken<Map<String, Map<String, JsonObject>>>() {}.getType();
Type type = new TypeToken<Map<String, Map<String, JsonObject>>>() {
}.getType();

if (modelConfigStr != null) {
this.modelConfig = JsonUtils.GSON.fromJson(modelConfigStr, type);
Expand Down Expand Up @@ -835,7 +873,8 @@ public static final class Arguments {
private boolean snapshotDisabled;
private String workflowStore;

public Arguments() {}
public Arguments() {
}

public Arguments(CommandLine cmd) {
tsConfigFile = cmd.getOptionValue("ts-config-file");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,9 @@ public void startWorker(int port) throws WorkerInitializationException, Interrup
argl.add(connector.isUds() ? "--sock-name" : "--port");
argl.add(connector.getSocketPath());

argl.add("--metrics-config");
argl.add(configManager.getMetricsConfig());

String[] envp =
EnvironmentUtils.getEnvString(
workingDir.getAbsolutePath(),
Expand Down
1 change: 1 addition & 0 deletions frontend/server/src/test/resources/config.properties
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,4 @@ models={\
# install_py_dep_per_model=false
# enable_metrics_api=false
workflow_store=../archive/src/test/resources/workflows
# metrics_config=../../../ts/tests/unit_tests/metrics_yaml_testing/metrics.yaml
msaroufim marked this conversation as resolved.
Show resolved Hide resolved
35 changes: 35 additions & 0 deletions frontend/server/src/test/resources/metrics_default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
dimensions:
ModelName: "my_tc" # customer should edit based on their own models that are being used
maaquib marked this conversation as resolved.
Show resolved Hide resolved
Level: "Model"

# not yet implemented in frontend
ts_metrics:
counter:
- name:
unit: ms
dimensions: [ModelName, Level]
gauge:
- name:
unit: ms
dimensions: [ModelName, Level]
histogram:
- name:
unit: ms
dimensions: [ModelName, Level]

model_metrics:
counter:
- InferenceTimeInMS:
unit: ms
dimensions: [ModelName, Level]
- NumberOfMetrics:
unit: count
dimensions: [ModelName, Level]
gauge:
- GaugeModelMetricNameExample:
unit: ms
dimensions: [ModelName, Level]
histogram:
- HistogramModelMetricNameExample:
unit: ms
dimensions: [ModelName, Level]
152 changes: 95 additions & 57 deletions ts/arg_parser.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@


"""
This module parses the arguments given through the torchserve command-line. This is used by model-server
at runtime.
Expand All @@ -14,48 +12,71 @@ class ArgParser(object):
Argument parser for torchserve and torchserve-export commands
TODO : Add readme url
"""

@staticmethod
def ts_parser():
"""
Argument parser for torchserve start service
"""
parser = argparse.ArgumentParser(prog='torchserve', description='Torchserve')
parser = argparse.ArgumentParser(prog="torchserve", description="Torchserve")

sub_parse = parser.add_mutually_exclusive_group(required=False)
sub_parse.add_argument('-v', '--version', action='store_true', help='Return TorchServe Version')
sub_parse.add_argument('--start', action='store_true', help='Start the model-server')
sub_parse.add_argument('--stop', action='store_true', help='Stop the model-server')
sub_parse.add_argument(
"-v", "--version", action="store_true", help="Return TorchServe Version"
)
sub_parse.add_argument(
"--start", action="store_true", help="Start the model-server"
)
sub_parse.add_argument(
"--stop", action="store_true", help="Stop the model-server"
)

parser.add_argument('--ts-config',
dest='ts_config',
help='Configuration file for model server')
parser.add_argument('--model-store',
required=False,
dest='model_store',
help='Model store location from where local or default models can be loaded')
parser.add_argument('--workflow-store',
required=False,
dest='workflow_store',
help='Workflow store location from where local or default workflows can be loaded')
parser.add_argument('--models',
metavar='MODEL_PATH1 MODEL_NAME=MODEL_PATH2...',
nargs='+',
help='Models to be loaded using [model_name=]model_location format. '
'Location can be a HTTP URL or a model archive file in MODEL_STORE.')
parser.add_argument('--log-config',
dest='log_config',
help='Log4j configuration file for model server')
parser.add_argument('--foreground',
help='Run the model server in foreground. If this option is disabled, the model server'
' will run in the background.',
action='store_true')
parser.add_argument('--no-config-snapshots', '--ncs',
dest='no_config_snapshots',
help='Prevents to server from storing config snapshot files.',
action='store_true')
parser.add_argument('--plugins-path', '--ppath',
dest='plugins_path',
help='plugin jars to be included in torchserve class path')
parser.add_argument(
"--ts-config", dest="ts_config", help="Configuration file for model server"
)
parser.add_argument(
"--model-store",
required=False,
dest="model_store",
help="Model store location from where local or default models can be loaded",
)
parser.add_argument(
"--workflow-store",
required=False,
dest="workflow_store",
help="Workflow store location from where local or default workflows can be loaded",
)
parser.add_argument(
"--models",
metavar="MODEL_PATH1 MODEL_NAME=MODEL_PATH2...",
nargs="+",
help="Models to be loaded using [model_name=]model_location format. "
"Location can be a HTTP URL or a model archive file in MODEL_STORE.",
)
parser.add_argument(
"--log-config",
dest="log_config",
help="Log4j configuration file for model server",
)
parser.add_argument(
"--foreground",
help="Run the model server in foreground. If this option is disabled, the model server"
" will run in the background.",
action="store_true",
)
parser.add_argument(
"--no-config-snapshots",
"--ncs",
dest="no_config_snapshots",
help="Prevents to server from storing config snapshot files.",
action="store_true",
)
parser.add_argument(
"--plugins-path",
"--ppath",
dest="plugins_path",
help="plugin jars to be included in torchserve class path",
)

return parser

Expand All @@ -65,30 +86,47 @@ def model_service_worker_args():
ArgParser for backend worker. Takes the socket name and socket type.
:return:
"""
parser = argparse.ArgumentParser(prog='model-server-worker', description='Model Server Worker')
parser.add_argument('--sock-type',
required=True,
dest="sock_type",
type=str,
choices=["unix", "tcp"],
help='Socket type the model service worker would use. The options are\n'
'unix: The model worker expects to unix domain-socket\n'
'tcp: The model worker expects a host-name and port-number')
parser = argparse.ArgumentParser(
prog="model-server-worker", description="Model Server Worker"
)
parser.add_argument(
"--sock-type",
required=True,
dest="sock_type",
type=str,
choices=["unix", "tcp"],
help="Socket type the model service worker would use. The options are\n"
"unix: The model worker expects to unix domain-socket\n"
"tcp: The model worker expects a host-name and port-number",
)

parser.add_argument(
"--sock-name",
required=False,
dest="sock_name",
type=str,
help="If 'sock-type' is 'unix', sock-name is expected to be a string. "
'Eg: --sock-name "test_sock"',
)

parser.add_argument('--sock-name',
required=False,
dest="sock_name",
type=str,
help='If \'sock-type\' is \'unix\', sock-name is expected to be a string. '
'Eg: --sock-name \"test_sock\"')
parser.add_argument(
"--host",
type=str,
help="If 'sock-type' is 'tcp' this is expected to have a host IP address",
)

parser.add_argument('--host',
type=str,
help='If \'sock-type\' is \'tcp\' this is expected to have a host IP address')
parser.add_argument(
"--port",
type=str,
help="If 'sock-type' is 'tcp' this is expected to have the host port to bind on",
)

parser.add_argument('--port',
type=str,
help='If \'sock-type\' is \'tcp\' this is expected to have the host port to bind on')
parser.add_argument(
"--metrics-config",
dest="metrics_config",
type=str,
msaroufim marked this conversation as resolved.
Show resolved Hide resolved
help="(Yaml) File to be passed for metrics configuration.",
)

return parser

Expand Down
Loading