Metrics cache implementation and integration with C++ backend #1975

namannandan · 2022-11-15T17:00:49Z

Description

Metrics cache implementation for C++ backend.
Metrics integration with C++ backend.

> python ./ts_scripts/install_from_src.py
> torchserve --start --ncs --model-store ~/Downloads/model_store
> curl -X POST "http://localhost:8081/models?url=resnet18.mar&model_name=resnet18&initial_workers=1"
> curl http://127.0.0.1:8080/predictions/resnet18 -T ~/Downloads/model_store/kitten_jpg.pt
> curl http://127.0.0.1:8080/predictions/resnet18 -T ~/Downloads/model_store/kitten_jpg.pt
> curl http://127.0.0.1:8080/predictions/resnet18 -T ~/Downloads/model_store/kitten_jpg.pt
> torchserve --stop
> cat logs/ts_log.log | grep model_worker_socket
2022-11-28T18:20:57,357 [DEBUG] W-9000-resnet18_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages/ts/cpp/bin/model_worker_socket, --sock_type, unix, --sock_name, /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T//.ts.sock.9000, --runtime_type, LSP, --model_dir, /private/var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/cf40d70bf8794c37a4840aea6be92f73, --logger_config_path, /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages/ts/cpp/resources/logging.config, --metrics_config_path, /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages/ts/configs/metrics.yaml]
> cat logs/model_metrics.log 
2022-11-28T18:21:08,981 - HandlerTime.Milliseconds:166.062353|#ModelName:resnet18,Level:Model|#hostname:88665a372f4b.ant.amazon.com,requestID:7de7fada-c117-45fa-88c5-f1adb87e7a1c,timestamp:1669688468
2022-11-28T18:21:24,652 - HandlerTime.Milliseconds:60.272718|#ModelName:resnet18,Level:Model|#hostname:88665a372f4b.ant.amazon.com,requestID:4f2f5b92-2216-4a24-9e45-77eac79f51da,timestamp:1669688484
2022-11-28T18:21:27,209 - HandlerTime.Milliseconds:36.81448|#ModelName:resnet18,Level:Model|#hostname:88665a372f4b.ant.amazon.com,requestID:df481fdd-f595-4a03-b6ec-dd5e28b5ce00,timestamp:1669688487

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Reference design

Feature testing

Benchmark

> python benchmark-ab.py --config benchmark-ab-config.json
Starting AB benchmark suite...


Configured execution parameters are:
{'url': 'http://0.0.0.0:2343/resnet18.mar', 'gpus': '', 'exec_env': 'local', 'batch_size': 1, 'batch_delay': 200, 'workers': 1, 'concurrency': 10, 'requests': 5000, 'input': '/Users/namannan/Downloads/model_store/kitten_jpg.pt', 'content_type': 'application/jpg', 'image': '', 'docker_runtime': '', 'backend_profiling': False, 'config_properties': 'config.properties', 'inference_model_url': 'predictions/benchmark', 'report_location': '/var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T', 'tmp_dir': '/var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T', 'result_file': '/var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/result.txt', 'metric_log': '/var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/logs/model_metrics.log', 'inference_url': 'http://0.0.0.0:8080', 'management_url': 'http://0.0.0.0:8081', 'config_properties_name': 'config.properties'}


Preparing local execution...
*Terminating any existing Torchserve instance ...
torchserve --stop
TorchServe is not currently running.
*Setting up model store...
*Starting local Torchserve instance...
torchserve --start --model-store /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/model_store --workflow-store /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/wf_store --ts-config /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/conf/config.properties > /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/logs/model_metrics.log
*Testing system health...
{
  "status": "Healthy"
}

*Registering model...
{
  "status": "Model \"benchmark\" Version: 1.0 registered with 1 initial workers"
}



Executing warm-up ...
ab -c 10  -n 500.0 -k -p /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/input -T  application/jpg http://0.0.0.0:8080/predictions/benchmark > /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/result.txt
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Executing inference performance tests ...
ab -c 10  -n 5000 -k -p /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/input -T  application/jpg http://0.0.0.0:8080/predictions/benchmark > /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/result.txt
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests
*Unregistering model ...
{
  "status": "Model \"benchmark\" unregistered"
}

*Terminating Torchserve instance...
torchserve --stop
TorchServe has stopped.
Apache Bench Execution completed.


Generating Reports...
Dropping 5076 warmup lines from log

Writing extracted PredictionTime metrics to /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/predict.txt 

Writing extracted HandlerTime metrics to /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/handler_time.txt 

Writing extracted QueueTime metrics to /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/waiting_time.txt 

Writing extracted WorkerThreadTime metrics to /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/worker_thread.txt 

Writing extracted CPUUtilization metrics to /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/cpu_percentage.txt 

Writing extracted MemoryUtilization metrics to /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/memory_percentage.txt 

Writing extracted GPUUtilization metrics to /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/gpu_percentage.txt 

Writing extracted GPUMemoryUtilization metrics to /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/gpu_memory_percentage.txt 

Writing extracted GPUMemoryUsed metrics to /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark/gpu_memory_used.txt 
*Generating CSV output...
Saving benchmark results to /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T
*Preparing graphs...
*Preparing Profile graphs...
Working with sampling rate of 50

Test suite execution complete.

> ls -la /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/benchmark
total 1536
drwxr-xr-x   17 namannan  staff     544 Nov 16 00:56 .
drwx------@ 153 namannan  staff    4896 Nov 16 00:56 ..
-rw-r--r--    1 namannan  staff     624 Nov 16 00:56 ab_report.csv
drwxr-xr-x    3 namannan  staff      96 Nov 16 00:51 conf
-rw-r--r--    1 namannan  staff      25 Nov 16 00:56 cpu_percentage.txt
-rw-r--r--    1 namannan  staff       0 Nov 16 00:56 gpu_memory_percentage.txt
-rw-r--r--    1 namannan  staff       0 Nov 16 00:56 gpu_memory_used.txt
-rw-r--r--    1 namannan  staff       0 Nov 16 00:56 gpu_percentage.txt
-rw-r--r--    1 namannan  staff   49415 Nov 16 00:56 handler_time.txt
-rw-r--r--    1 namannan  staff  602859 Nov 16 00:51 input
drwxr-xr-x    3 namannan  staff      96 Nov 16 00:51 logs
-rw-r--r--    1 namannan  staff      25 Nov 16 00:56 memory_percentage.txt
-rw-r--r--    1 namannan  staff   49469 Nov 16 00:56 predict.txt
-rw-r--r--    1 namannan  staff   21300 Nov 16 00:56 predict_latency.png
-rw-r--r--    1 namannan  staff    1408 Nov 16 00:56 result.txt
-rw-r--r--    1 namannan  staff   19996 Nov 16 00:56 waiting_time.txt
-rw-r--r--    1 namannan  staff   10095 Nov 16 00:56 worker_thread.txt

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

lxning

metrics caching is a singleton, can be directly accessed globally. It is not necessary to pass metrics caching or metrics.yaml path anywhere.

lxning · 2022-11-15T22:38:02Z

cpp/src/backends/core/backend.hh

@@ -67,8 +70,23 @@ class Backend {
  Backend() = default;
  virtual ~Backend() = default;

-  virtual bool Initialize(const std::string& model_dir) {
+  virtual bool Initialize(const std::string& model_dir,
+                          const std::string& metrics_config_path) {


I don't understand why metrics_config_path should be passed at here.

Basically, metrics is loaded into caching at init time of model worker, and then the metrics caching is able to be accessed globally,

cpp/src/backends/process/model_worker_socket.cc

lxning · 2022-11-18T04:32:18Z

cpp/src/utils/metrics/registry.hh

+  static std::shared_ptr<MetricsCache>& GetMetricsCacheInstance();
+
+ private:
+  static std::shared_ptr<MetricsConfigurationHandler> metrics_config_handler;


it is not necessary to waste space to store this variable which is only used for testing.

yaml parser can have its test case to cover this part.

lxning · 2022-11-18T04:32:41Z

cpp/src/utils/metrics/registry.hh

+  static void Initialize(const std::string& metrics_config_file_path,
+                         const MetricsContext& metrics_context);
+  static const std::shared_ptr<MetricsConfigurationHandler>&
+  GetMetricsConfigurationHandlerInstance();


same here as below comments.

lxning · 2022-11-18T04:44:53Z

cpp/src/backends/torch_scripted/torch_scripted_backend.cc

+  auto stop_time = std::chrono::high_resolution_clock::now();
+  std::chrono::duration<double, std::milli> duration = stop_time - start_time;
+  try {
+    auto& prediction_time_metric =
+        torchserve::MetricsRegistry::GetMetricsCacheInstance()->GetMetric(
+            torchserve::MetricType::GAUGE, "PredictionTime");
+    prediction_time_metric.AddOrUpdate(
+        std::vector<std::string>{manifest_->GetModel().model_name, "Model"},
+        duration.count());
+  } catch (std::runtime_error& e) {
+    TS_LOG(ERROR, e.what());
+  } catch (std::invalid_argument& e) {
+    TS_LOGF(ERROR, "Failed to record metric. {}", e.what());
+  }
+


the request_id is not included in the log metrics. it cause different format model_metrics.log b/w python and cpp. it may break cx parser pipeline.

lxning · 2022-11-18T04:47:07Z

cpp/src/backends/torch_scripted/handler/base_handler.hh

+
+      auto stop_time = std::chrono::high_resolution_clock::now();
+      std::chrono::duration<double, std::milli> duration =
+          stop_time - start_time;
+      try {
+        auto& handler_time_metric =
+            torchserve::MetricsRegistry::GetMetricsCacheInstance()->GetMetric(
+                torchserve::MetricType::GAUGE, "HandlerTime");
+        handler_time_metric.AddOrUpdate(
+            std::vector<std::string>{manifest_->GetModel().model_name, "Model"},
+            duration.count());
+      } catch (std::runtime_error& e) {
+        TS_LOG(ERROR, e.what());
+      } catch (std::invalid_argument& e) {
+        TS_LOGF(ERROR, "Failed to record metric. {}", e.what());
+      }


the request_id is not included in the log metrics. it cause different format model_metrics.log b/w python and cpp. it may break cx parser pipeline.

lxning · 2022-11-23T03:55:15Z

cpp/src/backends/torch_scripted/handler/base_handler.hh

+        std::string handler_time_metric_request_id = "";
+        for (auto request_iter = request_batch->begin();
+             request_iter != request_batch->end(); request_iter++) {
+          handler_time_metric_request_id += request_iter->request_id;
+          if (std::next(request_iter) != request_batch->end()) {
+            handler_time_metric_request_id += ",";
+          }
+        }


It is not efficient to create request id string in each metrics log. the request ids string can be created at the preprocess once, and used in any metrics log. check https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.cc#L47

lxning · 2022-11-23T03:55:45Z

cpp/src/backends/torch_scripted/torch_scripted_backend.cc

+    for (auto request_iter = request_batch->begin();
+         request_iter != request_batch->end(); request_iter++) {
+      prediction_time_metric_request_id += request_iter->request_id;
+      if (std::next(request_iter) != request_batch->end()) {
+        prediction_time_metric_request_id += ",";
+      }
+    }
+    prediction_time_metric.AddOrUpdate(
+        std::vector<std::string>{manifest_->GetModel().model_name, "Model"},
+        prediction_time_metric_request_id, duration.count());
+  } catch (std::runtime_error& e) {


lxning · 2022-11-24T01:23:18Z

cpp/src/backends/torch_scripted/handler/base_handler.cc

@@ -42,9 +42,24 @@ std::shared_ptr<torch::Device> BaseHandler::GetTorchDevice(
                                         load_model_request->gpu_id);
 }

+std::string BaseHandler::BuildRequestIdBatch(


It is good to have a function for code isolation. However, we also need think about perf. A batch can be very big. it is not necessary to iterate the batch twice in preprrocessing. the batch id string can be built in one loop (ie.

serve/cpp/src/backends/torch_scripted/handler/base_handler.cc

Line 73 in 1544bd3

for (auto& request : *request_batch) {

).

Agreed, I initially considered this approach but realized that the batch request id may be used outside the context of the handler, for example in the ModelInstance::Predict method to record the PredictionTime metric: https://github.com/pytorch/serve/pull/1975/files#diff-5758cb72b743a174c7a467614854369a6daac8e24bd86de05f25173677ae15f4R94

prediction_time_metric.AddOrUpdate( std::vector<std::string>{manifest_->GetModel().model_name, "Model"}, handler_->BuildRequestIdBatch(request_batch), duration.count());

Therefore, created a helper method to build the batch request id in the base handler.

Any suggestions on how to better structure this?

Updated implementation to iterate over request_batch and build request_id_batch only once in base_handler

This can be solved in predict function too.

Updated implementation to record both HandlerTime and PredictionTime metrics in ModelInstance::Predict method.

lxning · 2022-11-29T01:09:58Z

cpp/src/backends/torch_scripted/torch_scripted_backend.cc

+  try {
+    auto& prediction_time_metric =
+        torchserve::MetricsRegistry::GetMetricsCacheInstance()->GetMetric(
+            torchserve::MetricType::GAUGE, "PredictionTime");


Please remove PredictionTime metrics. HandlerTime Metrics is enough.

lxning · 2022-11-29T01:13:26Z

cpp/src/backends/torch_scripted/torch_scripted_backend.cc

+    request_id_batch += request_id_batch.empty() ? request.request_id
+                                                 : "," + request.request_id;


idx_to_req_id.first build the string. Check PR

Updated implementation

…#1975)" This reverts commit 3451bb7.

…#1975)" (#2011) This reverts commit 3451bb7.

lxning · 2022-11-29T23:13:50Z

cpp/src/backends/torch_scripted/handler/base_handler.cc

+    auto stop_time = std::chrono::high_resolution_clock::now();
+    std::chrono::duration<double, std::milli> duration = stop_time - start_time;
+    try {
+      auto& handler_time_metric =
+          torchserve::MetricsRegistry::GetMetricsCacheInstance()->GetMetric(
+              torchserve::MetricType::GAUGE, "HandlerTime");
+      handler_time_metric.AddOrUpdate(
+          std::vector<std::string>{manifest_->GetModel().model_name, "Model"},
+          idx_to_req_id.first, duration.count());
+    } catch (std::runtime_error& e) {
+      TS_LOG(ERROR, e.what());
+    } catch (std::invalid_argument& e) {
+      TS_LOGF(ERROR, "Failed to record HandlerTime metric. {}", e.what());
+    }


I checked benchmark code which needs predictiontime to generate model latency. In fact, predictiontime is as same as handlertime. To avoid breaking benchmark, please emit 2 metric at here.

Updated implementation and opened PR: #2012

namannandan · 2022-11-30T00:59:05Z

Opened new pull request #2012 with same code changes as in this PR, with review comments addressed since this PR was reverted in #2011

* add workaround solution from nvidia * add comments * expand runtimeType * add runtimeType in model config * add unit test * revert test/buildspec_gpu.yml * update testng.xml * update json files * fmt * fmt * init cpp dir * init code structure * Init socket code and otf protocol * add log api * decouple backend and model worker; impl torchscript load model; build scripts [ci skip] * delete src/CMakeLists.txt * init model archive manifest loader * add manifest and unit test * integrate manifest into backend; add unit test * update otf message internal structure; add inference request message * update otfmessage function return [skip ci] * add torch base handler * support dynamic load shared lib * disable install libtorch * update utils/CMakeLists.txt * add dynamic lib loader unit test * [skip CI] update src/utils/CMakeLists.txt * install kineto in build.sh * [skip ci] add handler factory * [skip ci] update inference request message * vision handler inference impl. * [skip ci] update torchscript backend api * change model_path to model_dir [skip ci] * [skip ci] torchscripted handler load model pass postive test * [skip ci] fix dll test teardown * [skip ci] add mnist inference positive test * update torchscripted base handler postprocess * [skip ci] add model instance status in backend * [skip ci]add prediction test for base and dll handler * [skip ci] clean up * add batch inference test * [skip ci] add dll close * [skip ci] file_system clean up * [skip ci] add mnist scripted model pt file for testing * [skip ci] torch_scripted/torch_scripted_backend_test refactory * [skip ci] torch_scripted_backend_test refactory * [skip ci] extend LoadPredict api * [skip ci] add negative test in torch_scripted_backend_test * explicit set ABI=1 * support different build step for linux and mac * [skip ci] update folly installation * add sudo for folly dep installation * [skip ci] update libtorch cuda installation * [skip ci] update sudo mac * [skip ci] update cuda option flag * [skip ci] set cuda compiler * [skip ci] skip install kineto on linux * [skip ci] fix cude compile path * add documnetation * update gcc version description * add cpp log config file option * add parameters * update setup.py for package cpp * set cpp libpath env * add unit test * [skip ci] install lib * [skip ci] add cpp log config path for cpp backend start * CPP OTFProtocol implementation for inference request and response (#1817) * Add folly logging * Adding model response serializer * Slight refactor * Adding test for otf protocol * Address review comments * Adding some logging tests * Refactoring socket methods into its own class to enable mocking for testing * otf protocol implementation for inference request and response * rebase from #1814 * rebase from #1814 * refactor after PR#1814 rebase * add unit tests for inference request and response otf protocol * add cpp backend test for otf_protocol and handler * Update logging flag to read log config file * Address test review comments * Remove response end from LoadModelResponse data struct * Removing googletest since folly already has it * Adding errno to socket send and receive failures * address review comments * refactor LoadRequest and Response OTF protocol to remove use of new and minor improvements * address review comments Co-authored-by: Aaqib Ansari <maaquib@gmail.com> * update model archiver for cpp * Bug fix in cpp integration (#1887) * bug fixes in java - cpp integration * revert arg changes * add clang-tidy in build (#1896) * replace strcpy with strncpy (#1898) * add clang-tidy in build * replace strcpu with strncpy * [WIP] cpp backend with Java integ (#1890) * Fix build script * Fix socket issues * Fix socket name truncation for uds * Fix expected log lines format from frontend * Removing some debug logs * Address review comments * Remove incorrect log line * Fix inference issue * Update filesystem import * Fix path of default logging file * Make build.sh executable * add clang-tidy and clang-format for cpp backend lint (#1910) * add clang-tidy in build * replace strcpu with strncpy * fix warnings for torchscripte backends * add .clang-tidy and update CMakeLists * add clang-format * remove unused parameter in cpp basehandler (#1917) * add clang-tidy in build * replace strcpu with strncpy * fix warnings for torchscripte backends * add .clang-tidy and update CMakeLists * add clang-format * remove unused parameters in basehandler and update mnist handler * remove libmnist_handler.dylib * remove not necessary func softmax in mnist example handler * fix clang-tidy warnings (#1915) * CPP mnist_base postman test (#1907) * add mnist base cpp postman integration test * refactor based on #1917 * add response body validation * disable grpc inference api test for cpp backend model * fix typo * install clang-format on linux (#1926) * Add CI for cpp_backend branch (#1916) * Create ci-cpu-cpp.yml * Update ci-cpu-cpp.yml * Update ci-cpu-cpp.yml * Update ci-cpu-cpp.yml * Metrics helper classes implementation for C++ backend (#1874) * Metrics helper classes implementation Dimension, Units and Metric * Refactor metrics helper classes 1) Move metrics helper classes from src/backends to src/utils 2) Update Metric class to store a vector of values instead of a single value * Fix metrics headers include guard to follow naming convention * Refactor metrics implementation to follow the API described in the metrics refactor RFC: #1492 * Revert changes to the following CMakeLists files since no change is required as part of the metrics implementation: cpp/src/backends/CMakeLists.txt cpp/test/CMakeLists.txt * Fix compiler warnings related to std::experimental::filesystem * Refactor metrics helper classes to simplify backend metrics implementation by emitting logs when the metrics API is called instead of storing them until the completion of an inference request to flush the metrics * Infer dimension names order from config file and use the same order for dimension values argument in the metrics API. Fix clang-tidy warnings. * Refactor backend metrics unit tests to use same regex as frontend to parse metric logs * install cpp via install_from_src (#1883) * add clang-tidy in build * replace strcpu with strncpy * fix warnings for torchscripte backends * add .clang-tidy and update CMakeLists * add clang-format * remove unused parameters in basehandler and update mnist handler * remove libmnist_handler.dylib * remove not necessary func softmax in mnist example handler * feature install cpp from install_from_src * add --install-dependencies in setup.py * fix typo * update MANIFEST.in and readme * update readme * code cleanup * update readme * update logging path * fix backend worker started checking * update readme * Update README.md * YAML metrics configuration handling for C++ backend (#1941) * fix yaml_cpp installation in build script (#1996) * fix yaml_cpp installation * build request id strings for one batch * Metrics cache implementation and integration with C++ backend (#1975) * Metrics cache implementation for C++ backend * Metrics cache integration with C++ backend Co-authored-by: Naman Nandan <namannan@amazon.com> * Revert "Metrics cache implementation and integration with C++ backend (#1975)" (#2011) This reverts commit 3451bb7. * Metrics cache implementation and integration with C++ backend (#2012) * Metrics cache implementation for C++ backend * Metrics cache integration with C++ backend Co-authored-by: Naman Nandan <namannan@amazon.com> * Fix lint error * Fix lint error * Fix model-archiver after cpp merge * Adjust signature of workerLifeCycleMnist.startWorker in test * Fix unit tests after merging master into cpp_backend * Fix linting error * Install dependencies for cpp backend * Fix unit tests after cpp merge * Fix formatting * Move installation of cpp deps to ts_scripts/install_dependencies.py * Build cpp backend for regression and sanity tests * Fix formatting * Fix typo * Temp fix hanging after starting cpp worker * Add pytest for cpp backend * Roll back building of cpp abckend in ci regression and sanity tests; install deps in cpp ci * Fix formatting * Remove mnist_custom_cpp.mar file from postman test as we do not build cpp backend for general regression test * Remove cpp model archive in additional place * Remove cpp build from setup.py * Remove additional ref to build_cpp in setup.py * fix code link * Update README.md * Update libtorch versions + move installation of cpp backend to build.sh * Prepare cpp build workflow for merge into master * Update cuda version in cpp/build.sh * Remove reference to LDP * Fix WorkerLifeCycleTest * rm src/test/resources/config_test_cpp.properties * Remove debug prints * Skip cpp backend test if cpp backend is not available --------- Co-authored-by: lxning <lninga@amazon.com> Co-authored-by: Aaqib Ansari <maaquib@gmail.com> Co-authored-by: lxning <23464292+lxning@users.noreply.github.com> Co-authored-by: rohithkrn <rohith.nallamaddi@gmail.com> Co-authored-by: Naman Nandan <namankt55@gmail.com> Co-authored-by: Naman Nandan <namannan@amazon.com> Co-authored-by: Geeta Chauhan <4461127+chauhang@users.noreply.github.com>

namannandan mentioned this pull request Nov 15, 2022

Metrics cache implementation for C++ backend namannandan/serve#2

Closed

1 task

namannandan marked this pull request as ready for review November 15, 2022 19:46

namannandan requested review from maaquib, lxning, rohithkrn and msaroufim November 15, 2022 19:46

namannandan added c++ metrics labels Nov 15, 2022

lxning reviewed Nov 15, 2022

View reviewed changes

namannandan requested a review from lxning November 16, 2022 20:28

lxning reviewed Nov 18, 2022

View reviewed changes

lxning reviewed Nov 23, 2022

View reviewed changes

lxning reviewed Nov 24, 2022

View reviewed changes

namannandan requested a review from lxning November 24, 2022 03:10

lxning reviewed Nov 29, 2022

View reviewed changes

Naman Nandan added 2 commits November 28, 2022 17:38

Metrics cache implementation for C++ backend

f21518d

Metrics cache integration with C++ backend

0ade963

namannandan force-pushed the naman-metrics-cache branch from cda1964 to 0ade963 Compare November 29, 2022 02:21

lxning merged commit 3451bb7 into pytorch:cpp_backend Nov 29, 2022

lxning added a commit that referenced this pull request Nov 29, 2022

Revert "Metrics cache implementation and integration with C++ backend (…

f052cf5

…#1975)" This reverts commit 3451bb7.

lxning mentioned this pull request Nov 29, 2022

Revert "Metrics cache implementation and integration with C++ backend" #2011

Merged

lxning added a commit that referenced this pull request Nov 29, 2022

Revert "Metrics cache implementation and integration with C++ backend (…

bc9ec15

…#1975)" (#2011) This reverts commit 3451bb7.

lxning reviewed Nov 29, 2022

View reviewed changes

namannandan mentioned this pull request Nov 30, 2022

Metrics cache implementation and integration with C++ backend #2012

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics cache implementation and integration with C++ backend #1975

Metrics cache implementation and integration with C++ backend #1975

namannandan commented Nov 15, 2022 •

edited

Loading

lxning left a comment •

edited

Loading

lxning Nov 15, 2022

lxning Nov 18, 2022

lxning Nov 18, 2022

lxning Nov 18, 2022

lxning Nov 18, 2022

lxning Nov 23, 2022

lxning Nov 23, 2022

lxning Nov 24, 2022

namannandan Nov 24, 2022

namannandan Nov 24, 2022

lxning Nov 28, 2022

namannandan Nov 28, 2022

lxning Nov 29, 2022

lxning Nov 29, 2022

namannandan Nov 29, 2022

lxning Nov 29, 2022 •

edited

Loading

namannandan Nov 30, 2022

namannandan commented Nov 30, 2022 •

edited

Loading

		request_id_batch += request_id_batch.empty() ? request.request_id
		: "," + request.request_id;

Metrics cache implementation and integration with C++ backend #1975

Metrics cache implementation and integration with C++ backend #1975

Conversation

namannandan commented Nov 15, 2022 • edited Loading

Description

Type of change

Reference design

Feature testing

Checklist:

lxning left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lxning Nov 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

namannandan commented Nov 30, 2022 • edited Loading

namannandan commented Nov 15, 2022 •

edited

Loading

lxning left a comment •

edited

Loading

lxning Nov 29, 2022 •

edited

Loading

namannandan commented Nov 30, 2022 •

edited

Loading