Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics helper classes implementation for C++ backend #1874

Merged

Conversation

namannandan
Copy link
Collaborator

@namannandan namannandan commented Sep 23, 2022

This PR implements the following Metrics helper classes in C++ for Torchserve backend

  • Units
  • Metric

Testing

  • Each of the classes have their associated unit tests

Reference design

@namannandan namannandan marked this pull request as ready for review September 23, 2022 22:13
@namannandan namannandan changed the title Metrics helper classes implementation for C++ bakend Metrics helper classes implementation for C++ backend Sep 23, 2022
@@ -2,6 +2,7 @@ set(TS_BACKENDS_SRC_DIR "${torchserve_cpp_SOURCE_DIR}/src/backends")
set(TS_BACKENDS_CORE_SRC_DIR "${TS_BACKENDS_SRC_DIR}/core")
set(TS_BACKENDS_PROTOCOL_SRC_DIR "${TS_BACKENDS_SRC_DIR}/protocol")
set(TS_BACKENDS_PROCESS_SRC_DIR "${TS_BACKENDS_SRC_DIR}/process")
set(TS_BACKENDS_METRICS_SRC_DIR "${TS_BACKENDS_SRC_DIR}/metrics")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics should be in src/utils, not src/backends

class Dimension {
public:
Dimension(const std::string& name, const std::string& value);
std::string ToString() const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dimension is a map b/w name and a list of values. eg.
dimension name: batch_size and values can be 1, 2, 4, 8.

Comment on lines 8 to 14
template <typename ValueType>
Metric<ValueType>::Metric(
const std::string& name, const MetricType& type, const ValueType& value,
const std::string& unit, const std::vector<Dimension>& dimensions,
const std::string& request_id, const bool& is_updated) :
name{name}, type{type}, value{value}, unit{Units::GetUnitMapping(unit)},
dimensions{dimensions}, request_id{request_id}, is_updated{is_updated} {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls take a look at prometheus client API.

Comment on lines 27 to 38
list(APPEND TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES ${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}/torch_scripted_backend.cc)
list(APPEND TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES ${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}/handler/base_handler.cc)
add_library(ts_backends_torch_scripted SHARED ${TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES})
target_include_directories(ts_backends_torch_scripted PUBLIC
target_include_directories(ts_backends_torch_scripted PUBLIC
${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR} ${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}/handler ${TORCH_INCLUDE_DIRS})
target_link_libraries(ts_backends_torch_scripted PUBLIC ts_utils ts_backends_core ${TORCH_LIBRARIES})
target_link_libraries(ts_backends_torch_scripted PUBLIC ts_utils ts_backends_core ${TORCH_LIBRARIES})
install(TARGETS ts_backends_torch_scripted DESTINATION ${torchserve_cpp_SOURCE_DIR}/_build/libs)

# build library ts_backend_torch_deploy
#set(TS_BACKENDS_TORCH_DEPLOY_SOURCE_FILES "")
#add_library(ts_backends_torch_deploy SHARED ${TS_BACKENDS_TORCH_DEPLOY_SOURCE_FILES})
#target_include_directories(ts_backends_torch_deploy PUBLIC ${TS_BACKENDS_TORCH_DEPLOY_SRC_DIR})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no changes for this CMakeLists.txt. Please rebase your code to remove this file from this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rebased my dev branch and removed all code changes to this file, but since I had touched this file in a previous commit, the pre-commit hook to remove trailing whitespaces is still running on this file.

target_include_directories(ts_utils PRIVATE ${Boost_INCLUDE_DIRS})
target_link_libraries(ts_utils ${FOLLY_LIBRARIES} ${CMAKE_DL_LIBS} ${Boost_LIBRARIES})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why boost is needed here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Boost is not required in this PR, so I'll remove it. I had initially included it to fetch the hostname in a platform independent manner while generating the metic log. Will consider including it if required in a subsequent PR when implementing the STDOUTMetricEmitter class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, Boost is required in this PR since I've included implementation for the Emitter.

Comment on lines 9 to 16
Dimension(const std::string& name, const std::string& value);
const std::string GetName() const;
const std::string GetValue() const;

private:
const std::string name;
const std::string value;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please address my previous comments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree with your comment that a Dimension has a name and can take on multiple values. The intention of defining this class was to use it in the context of a particular metric instance. For example, if we have a metric that tracks the inference latency for a fastrcnn model with batch size 2, then the dimensions for that specific metric instance could be something like {"model_name": "fastrcnn", "batch_size": "2"}. Example usage in testcase. Also, I believe that we will not know all the possible dimension values during initialization time, since we don't enforce that in the metrics configuration. Please let me know your thoughts on this.

Copy link
Collaborator

@lxning lxning Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is impossible to know all the possible values of a dimension. As an API, the data structure should be able to reflect this use case and also save memory spaces. In concept, a metric includes

  • metric name
  • Vector dimension names
  • ConcurrentHashMap<Vector, Sample> // Sample is a concrete metric data point such as a counter sample point.

So, does dimension class a Must have class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, for storing dimension names and values within a Metric object, the Dimension object may not be absolutely necessary. I was also considering its use in the context of supporting the MetricCache API, i.e Metric addMetric(metricName, unit, dimensions, metricType); and Metric getMetric(metricName, dimensions);. In both of these cases, the dimensions argument will need to be a vector of <dimension_name:single_dimension_value> tuples. So here, it may be clear to the user, if we express the dimensions input argument to be of type std::vector<Dimension>. What do you think?

const std::string request_id;
};

class Metric {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we sync up last week, we need first define torchserve Metric API so that we are able to extend it to different implementation such as frontend prometheus and backend log metrics.

Here the code is more related to backend. it is difficult to later extend to frontend. So please define metric api first, and then implement backend metrics.

Copy link
Collaborator Author

@namannandan namannandan Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we are on the same page. The class I'm defining here is going to be used to store the metric attributes (name, unit, type, dimensions and values) for a single instance of a particular metric. The plan is to add a MetricCache class which will hold several of these Metric objects. The modularity to emit metrics either as a log or to prometheus or a different metric framework in future will come from the MetricEmitter handle in the MetricCache instance. For example, the backend MetricCache instance will use the STDOUTMetricEmitter and the frontend MetricCache instance will use the PrometheusMetricEmitter. The plan is to implement MetricCache and MetricEmitter classes in subsequent PRs. Please let me know let me know your thoughts on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated implementation to follow the API defined in metrics redesign RFC: #1492

Comment on lines 45 to 46
const std::vector<Dimension> dimensions;
std::vector<MetricValue> values;
Copy link
Collaborator

@lxning lxning Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data structure can be simplified and also prepared for sample data point fast updating, eg hashmap. see comments

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it may be more efficient to use a hashmap to store the metric values, but a couple of quick questions on the suggested data structure ConcurrentHashMap<Vector, Sample> to be used inside the Metric object:

  1. Since Vector is mutable, is it valid to use it as a key in a hashmap?
  2. If we plan to store the Sample value associated with each possible combination of dimension values in a single Metric object, how would we support the Metric API void Update(value);? Everytime Update is called, the caller will need to supply the dimension values as well otherwise we would not know which Sample to update. Instead, if we store the value associated with a single metric with specific combination of dimension values in a single Metric object, wouldn't that make the implementation simpler to support the MetricCache APIs Metric addMetric(metricName, unit, dimensions, metricType);, Metric getMetric(metricName, dimensions); and also the Metric API void Update(value);? Although the tradeoff here is that we will use more space by duplicating metric attributes across Metric objects.

target_include_directories(ts_utils PRIVATE ${Boost_INCLUDE_DIRS})
target_link_libraries(ts_utils ${FOLLY_LIBRARIES} ${CMAKE_DL_LIBS} ${Boost_LIBRARIES})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, Boost is required in this PR since I've included implementation for the Emitter.

const std::string request_id;
};

class Metric {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated implementation to follow the API defined in metrics redesign RFC: #1492

namespace torchserve {
class MetricCache {
public:
virtual Metric AddMetric(const std::string& name, const std::string& unit,
Copy link
Collaborator Author

@namannandan namannandan Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogMetricCache implementation for backed which will derive from MetricCache will support the AddMetric and GetMetric API by:

  1. Storing allowed metric names and corresponding unit and dimension names obtained from the metrics configuration.
  2. Storing a map of type std::unordered_map<std::string, LogMetric>. Here, the key will be a string consisting of metric name, unit, dimension name and value corresponding to each dimension.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we going to have other implementations other than LogMetricCache in the future?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can have a PrometheusMetricCache which the frontend can use to store references to prometheus client metric objects.


namespace torchserve {
// Value entry for log metric consisting of <metric_value, timestamp, request_id>
typedef std::tuple<double, std::uint64_t, std::string> LogMetricValue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: using LogMetricValue = std::tuple<double, std::uint64_t, std::string> is more modern?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will update.

@namannandan
Copy link
Collaborator Author

Discussion on design updates and implementation details are tracked in this document.

const MetricType type;
const std::string name;
const std::string unit;
const std::set<std::string> dimension_names;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A metrics dimension has order. is std::set able to keep order as input from metrics.yaml?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe std::set will maintain its elements in sorted order (ascending by default) but may not be the same order as in the metrics configuration file. This will not matter for the backend metrics implementation in this PR since it does not make any assumptions about the order in which the dimension names or values are provided. AddOrUpdate method accepts a map consisting of keys as dimension names and values as dimension values.

Do we want the implementation to use the metric dimension order specified in the configuration file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend should also keep the same order defined in the metrics configuration b/c backend metrics needs to be injected into prometheus metrics by frontend.

TSLogMetric(const MetricType& type, const std::string& name, const std::string& unit,
const std::set<std::string>& dimension_names) :
IMetric(type, name, unit, dimension_names) {}
void AddOrUpdate(const std::map<std::string, std::string>& dimension_values, const double& value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change dimension_values as dimensions if here "dimension_values" means dimension name and dimension value.

Copy link
Collaborator Author

@namannandan namannandan Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the argument here is a map with key as dimension name and value as dimension value.

Do we want this argument to be a vector of dimension values instead, with dimension name order assumed from configuration file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name should reflect its meaning. That's my comments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, makes sense. Also, considering the fact that we want to retain order in which dimension names are specified in the metrics configuration, the dimension_values argument can be a vector of strings corresponding to the dimension names. Will update implementation.

const std::set<std::string>& dimension_names) :
IMetric(type, name, unit, dimension_names) {}
void AddOrUpdate(const std::map<std::string, std::string>& dimension_values, const double& value);
void AddOrUpdate(const std::map<std::string, std::string>& dimension_values,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change dimension_values as dimensions if here "dimension_values" means dimension name and dimension value.

const std::string& request_id, const double& value);

private:
void ValidateDimensionValues(const std::map<std::string, std::string>& dimension_values);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change dimension_values as dimensions if here "dimension_values" means dimension name and dimension value.

}

void TSLogMetric::ValidateMetricValue(const double& value) {
if(type == MetricType::COUNTER && value < 0.0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why value must be largeOrequal 0?

Copy link
Collaborator Author

@namannandan namannandan Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A counter metric value can only increase or be reset to 0 on restart as per the prometheus metric definition: https://prometheus.io/docs/concepts/metric_types/#counter.
The prometheus client metric implementation also has this check: https://github.com/jupp0r/prometheus-cpp/blob/871e782d57fec2fba1f00e1c4ed44cc1fc4a0f66/core/src/counter.cc#L8

Do we want to include this check here or ignore it and let the frontend handle it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

private:
void ValidateDimensionValues(const std::map<std::string, std::string>& dimension_values);
void ValidateMetricValue(const double& value);
void Emit(const std::map<std::string, std::string>& dimension_values, const double& value,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change dimension_values as dimensions if here "dimension_values" means dimension name and dimension value

Comment on lines 76 to 87
std::string hostname = ::boost::asio::ip::host_name();
std::uint64_t timestamp = std::chrono::duration_cast<std::chrono::seconds>(
std::chrono::system_clock::now().time_since_epoch()).count();

if (request_id.empty()) {
TS_LOGF(INFO, "[METRICS]{}.{}:{}|#{}|#hostname:{},{}",
name, unit, value, dimension_values_string, hostname, timestamp);
}
else {
TS_LOGF(INFO, "[METRICS]{}.{}:{}|#{}|#hostname:{},{},{}",
name, unit, value, dimension_values_string, hostname, timestamp, request_id);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which end (frontend or backend) uses this emit function? Only frontend log includes host name and timestamp, backend does not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to be used by backend. I included the hostname and timestamp to align with the current python implementation for backend metrics:

self.name, self.unit, self.value, dims, socket.gethostname(), int(time.time()))

which is invoked here:
logger.info("[METRICS]%s", str(met))

Do we want to exclude these in the CPP backend metrics implementation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then keep the same pattern as the existing python backend,

Naman Nandan added 8 commits October 21, 2022 15:41
1) Move metrics helper classes from src/backends to src/utils
2) Update Metric class to store a vector of values instead of a single value
is required as part of the metrics implementation:
cpp/src/backends/CMakeLists.txt
cpp/test/CMakeLists.txt
…ation

by emitting logs when the metrics API is called instead of storing them
until the completion of an inference request to flush the metrics
for dimension values argument in the metrics API.
Fix clang-tidy warnings.
protected:
const std::string logfile_path{"test/resources/logging/test.log"};
const std::string logger_config_path_str{
"test/resources/logging/log_to_file.config"};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the log_to_file.config?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is here: https://github.com/pytorch/serve/blob/cpp_backend/cpp/test/resources/logging/log_to_file.config
and is used for the logging unit tests. I've re-used the same here.

Comment on lines 132 to 134
"^.*\\[METRICS\\]test_metric\\.Milliseconds\\:1\\.5\\|#level\\:model,"
"model_name\\:test_model\\"
"|#hostname\\:.*,[0-9]+$";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how frontend parse backend metrics log. Please follow the same regex.

Copy link
Collaborator Author

@namannandan namannandan Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will update unit tests to use same metric log regex as frontend.

namespace torchserve {
class TSLogMetricTest : public ::testing::Test {
protected:
const std::string logfile_path{"test/resources/logging/test.log"};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not a good way to write a log in resource. Backend log is stdio log.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update the log path to test/utils/metrics/metrics_test.log

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I considered the option of capturing the stdout for the unit test, but it appears that gtest does not offer any public API for capturing test stdout from what I could find. Although it does have an internal API for capturing stdout which may not be advisable to use in the test code:

  1. https://github.com/google/googletest/blob/191ca1f3a9262b90a586ae2c2e8c742c3d867801/googletest/src/gtest-port.cc#L1117
  2. https://stackoverflow.com/questions/3803465/how-to-capture-stdout-stderr-with-googletest

Therefore instead, I used log to file and then parse the file to validate the log output.

@namannandan
Copy link
Collaborator Author

Addressed review comments. Verified successful build and unit tests on Mac and Ubuntu. Merging PR.

@namannandan namannandan merged commit fb22b88 into pytorch:cpp_backend Nov 1, 2022
github-merge-queue bot pushed a commit that referenced this pull request Jan 18, 2024
* add workaround solution from nvidia

* add comments

* expand runtimeType

* add runtimeType in model config

* add unit test

* revert test/buildspec_gpu.yml

* update testng.xml

* update json files

* fmt

* fmt

* init cpp dir

* init code structure

* Init socket code and otf protocol

* add log api

* decouple backend and model worker; impl torchscript load model; build scripts [ci skip]

* delete src/CMakeLists.txt

* init model archive manifest loader

* add manifest and unit test

* integrate manifest into backend; add unit test

* update otf message internal structure; add inference request message

* update otfmessage function return [skip ci]

* add torch base handler

* support dynamic load shared lib

* disable install libtorch

* update utils/CMakeLists.txt

* add dynamic lib loader unit test

* [skip CI] update src/utils/CMakeLists.txt

* install kineto in build.sh

* [skip ci] add handler factory

* [skip ci] update inference request message

* vision handler inference impl.

* [skip ci] update torchscript backend api

* change model_path to model_dir [skip ci]

* [skip ci] torchscripted handler load model pass postive test

* [skip ci] fix dll test teardown

* [skip ci] add mnist inference positive test

* update torchscripted base handler postprocess

* [skip ci] add model instance status in backend

* [skip ci]add prediction  test for base and dll handler

* [skip ci] clean up

* add batch inference test

* [skip ci] add dll close

* [skip ci] file_system clean up

* [skip ci] add mnist scripted model pt file for testing

* [skip ci] torch_scripted/torch_scripted_backend_test refactory

* [skip ci] torch_scripted_backend_test refactory

* [skip ci] extend LoadPredict api

* [skip ci] add negative test in torch_scripted_backend_test

* explicit set ABI=1

* support different build step for linux and mac

* [skip ci] update folly installation

* add sudo for folly dep installation

* [skip ci] update libtorch cuda installation

* [skip ci] update sudo mac

* [skip ci] update cuda option flag

* [skip ci] set cuda compiler

* [skip ci] skip install kineto on linux

* [skip ci] fix cude compile path

* add documnetation

* update gcc version description

* add cpp log config file option

* add parameters

* update setup.py for package cpp

* set cpp libpath env

* add unit test

* [skip ci] install lib

* [skip ci] add cpp log config path for cpp backend start

* CPP OTFProtocol implementation for inference request and response (#1817)

* Add folly logging

* Adding model response serializer

* Slight refactor

* Adding test for otf protocol

* Address review comments

* Adding some logging tests

* Refactoring socket methods into its own class to enable mocking for testing

* otf protocol implementation for inference request and response

* rebase from #1814

* rebase from #1814

* refactor after PR#1814 rebase

* add unit tests for inference request and response otf protocol

* add cpp backend test for otf_protocol and handler

* Update logging flag to read log config file

* Address test review comments

* Remove response end from LoadModelResponse data struct

* Removing googletest since folly already has it

* Adding errno to socket send and receive failures

* address review comments

* refactor LoadRequest and Response OTF protocol to remove use of new and minor improvements

* address review comments

Co-authored-by: Aaqib Ansari <maaquib@gmail.com>

* update model archiver for cpp

* Bug fix in cpp integration (#1887)

* bug fixes in java - cpp integration

* revert arg changes

* add clang-tidy in build (#1896)

* replace strcpy with strncpy (#1898)

* add clang-tidy in build

* replace strcpu with strncpy

* [WIP] cpp backend with Java integ (#1890)

* Fix build script

* Fix socket issues

* Fix socket name truncation for uds

* Fix expected log lines format from frontend

* Removing some debug logs

* Address review comments

* Remove incorrect log line

* Fix inference issue

* Update filesystem import

* Fix path of default logging file

* Make build.sh executable

* add clang-tidy and clang-format for cpp backend lint (#1910)

* add clang-tidy in build

* replace strcpu with strncpy

* fix warnings for torchscripte backends

* add .clang-tidy and update CMakeLists

* add clang-format

* remove unused parameter in cpp basehandler (#1917)

* add clang-tidy in build

* replace strcpu with strncpy

* fix warnings for torchscripte backends

* add .clang-tidy and update CMakeLists

* add clang-format

* remove unused parameters in basehandler and update mnist handler

* remove libmnist_handler.dylib

* remove not necessary func softmax in mnist example handler

* fix clang-tidy warnings (#1915)

* CPP mnist_base postman test (#1907)

* add mnist base cpp postman integration test

* refactor based on #1917

* add response body validation

* disable grpc inference api test for cpp backend model

* fix typo

* install clang-format on linux (#1926)

* Add CI for cpp_backend branch (#1916)

* Create ci-cpu-cpp.yml

* Update ci-cpu-cpp.yml

* Update ci-cpu-cpp.yml

* Update ci-cpu-cpp.yml

* Metrics helper classes implementation for C++ backend (#1874)

* Metrics helper classes implementation
Dimension, Units and Metric

* Refactor metrics helper classes
1) Move metrics helper classes from src/backends to src/utils
2) Update Metric class to store a vector of values instead of a single value

* Fix metrics headers include guard to follow naming convention

* Refactor metrics implementation to follow the API described in the
metrics refactor RFC: #1492

* Revert changes to the following CMakeLists files since no change
is required as part of the metrics implementation:
cpp/src/backends/CMakeLists.txt
cpp/test/CMakeLists.txt

* Fix compiler warnings related to std::experimental::filesystem

* Refactor metrics helper classes to simplify backend metrics implementation
by emitting logs when the metrics API is called instead of storing them
until the completion of an inference request to flush the metrics

* Infer dimension names order from config file and use the same order
for dimension values argument in the metrics API.
Fix clang-tidy warnings.

* Refactor backend metrics unit tests to use same regex as frontend to parse metric logs

* install cpp via install_from_src (#1883)

* add clang-tidy in build

* replace strcpu with strncpy

* fix warnings for torchscripte backends

* add .clang-tidy and update CMakeLists

* add clang-format

* remove unused parameters in basehandler and update mnist handler

* remove libmnist_handler.dylib

* remove not necessary func softmax in mnist example handler

* feature install cpp from install_from_src

* add --install-dependencies in setup.py

* fix typo

* update MANIFEST.in and readme

* update readme

* code cleanup

* update readme

* update logging path

* fix backend worker started checking

* update readme

* Update README.md

* YAML metrics configuration handling for C++ backend (#1941)

* fix yaml_cpp installation in build script (#1996)

* fix yaml_cpp installation

* build request id strings for one batch

* Metrics cache implementation and integration with C++ backend (#1975)

* Metrics cache implementation for C++ backend

* Metrics cache integration with C++ backend

Co-authored-by: Naman Nandan <namannan@amazon.com>

* Revert "Metrics cache implementation and integration with C++ backend (#1975)" (#2011)

This reverts commit 3451bb7.

* Metrics cache implementation and integration with C++ backend (#2012)

* Metrics cache implementation for C++ backend

* Metrics cache integration with C++ backend

Co-authored-by: Naman Nandan <namannan@amazon.com>

* Fix lint error

* Fix lint error

* Fix model-archiver after cpp merge

* Adjust signature of workerLifeCycleMnist.startWorker in test

* Fix unit tests after merging master into cpp_backend

* Fix linting error

* Install dependencies for cpp backend

* Fix unit tests after cpp merge

* Fix formatting

* Move installation of cpp deps to ts_scripts/install_dependencies.py

* Build cpp backend for regression and sanity tests

* Fix formatting

* Fix typo

* Temp fix hanging after starting cpp worker

* Add pytest for cpp backend

* Roll back building of cpp abckend in ci regression and sanity tests; install deps in cpp ci

* Fix formatting

* Remove mnist_custom_cpp.mar file from postman test as we do not build cpp backend for general regression test

* Remove cpp model archive in additional place

* Remove cpp build from setup.py

* Remove additional ref to build_cpp in setup.py

* fix code link

* Update README.md

* Update libtorch versions + move installation of cpp backend to build.sh

* Prepare cpp build workflow for merge into master

* Update cuda version in cpp/build.sh

* Remove reference to LDP

* Fix WorkerLifeCycleTest

* rm src/test/resources/config_test_cpp.properties

* Remove debug prints

* Skip cpp backend test if cpp backend is not available

---------

Co-authored-by: lxning <lninga@amazon.com>
Co-authored-by: Aaqib Ansari <maaquib@gmail.com>
Co-authored-by: lxning <23464292+lxning@users.noreply.github.com>
Co-authored-by: rohithkrn <rohith.nallamaddi@gmail.com>
Co-authored-by: Naman Nandan <namankt55@gmail.com>
Co-authored-by: Naman Nandan <namannan@amazon.com>
Co-authored-by: Geeta Chauhan <4461127+chauhang@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants