Feature/cpp baby llama rework #2903

mreso · 2024-01-24T23:10:42Z

Description

This PR is a rebase of #2544 which add a baby llama example to the cpp backend.
Additionally, it removes the framework specific backends like the TorchScriptBackend.
With this PR no custom backend for different frameworks like llama.cpp, vllm, TorchScript will be necessary.
Instead, the handler .so file can be linked against any framework that suites the current use case.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

cpp tests
Logs for Test A:

torchserve_cpp build is complete. To run unit test:   ./_build/test/torchserve_cpp_test
Running main() from /home/ubuntu/serve/cpp/_build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 45 tests from 11 test suites.
[----------] Global test environment set-up.
[----------] 1 test from BackendIntegTest
[ RUN      ] BackendIntegTest.TestOTFProtocolAndHandler
I0124 23:58:33.530290 279102 log_metric.cc:92] [METRICS]HandlerTime.Milliseconds:69.427827|#ModelName:mnist_scripted_v2,Level:Model|#hostname:ip-172-31-55-226,1706140713,reqi
I0124 23:58:33.530375 279102 log_metric.cc:92] [METRICS]PredictionTime.Milliseconds:69.427827|#ModelName:mnist_scripted_v2,Level:Model|#hostname:ip-172-31-55-226,1706140713,reqi
[       OK ] BackendIntegTest.TestOTFProtocolAndHandler (95 ms)
[----------] 1 test from BackendIntegTest (95 ms total)

[----------] 8 tests from OTFMessageTest
[ RUN      ] OTFMessageTest.TestRetieveCmd
[       OK ] OTFMessageTest.TestRetieveCmd (0 ms)
[ RUN      ] OTFMessageTest.TestEncodeLoadModelResponse
[       OK ] OTFMessageTest.TestEncodeLoadModelResponse (0 ms)
[ RUN      ] OTFMessageTest.TestUTF8EncodeLoadModelResponse
[       OK ] OTFMessageTest.TestUTF8EncodeLoadModelResponse (0 ms)
[ RUN      ] OTFMessageTest.TestRetrieveMsgLoadGpu
[       OK ] OTFMessageTest.TestRetrieveMsgLoadGpu (0 ms)
[ RUN      ] OTFMessageTest.TestRetrieveMsgLoadNoGpu
[       OK ] OTFMessageTest.TestRetrieveMsgLoadNoGpu (0 ms)
[       OK ] TSLogMetricTest.TestGaugeMetric (1 ms)
[ RUN      ] TSLogMetricTest.TestHistogramMetric
[       OK ] TSLogMetricTest.TestHistogramMetric (1 ms)
[ RUN      ] TSLogMetricTest.TestTSLogMetricEmitWithRequestId
[       OK ] TSLogMetricTest.TestTSLogMetricEmitWithRequestId (1 ms)
[ RUN      ] TSLogMetricTest.TestTSLogMetricEmitWithoutRequestId
[       OK ] TSLogMetricTest.TestTSLogMetricEmitWithoutRequestId (1 ms)
[ RUN      ] TSLogMetricTest.TestTSLogMetricEmitWithIncorrectDimensionData
[       OK ] TSLogMetricTest.TestTSLogMetricEmitWithIncorrectDimensionData (0 ms)
[----------] 6 tests from TSLogMetricTest (7 ms total)

[----------] 2 tests from TSLogMetricsCacheTest
[ RUN      ] TSLogMetricsCacheTest.TestInitialize
[       OK ] TSLogMetricsCacheTest.TestInitialize (3 ms)
[ RUN      ] TSLogMetricsCacheTest.TestGetMetric
I0124 23:58:35.419207 279102 log_metric.cc:89] [METRICS]GaugeTsMetricExample.Count:1.5|#model_name:model_name,host_name:host_name|#hostname:ip-172-31-55-226,1706140715
[       OK ] TSLogMetricsCacheTest.TestGetMetric (1 ms)
[----------] 2 tests from TSLogMetricsCacheTest (4 ms total)

[----------] 3 tests from RegistryTest
[ RUN      ] RegistryTest.TestValidConfigFile
[       OK ] RegistryTest.TestValidConfigFile (1 ms)
[ RUN      ] RegistryTest.TestInvalidConfigFile
[       OK ] RegistryTest.TestInvalidConfigFile (0 ms)
[ RUN      ] RegistryTest.TestReInitialize
[       OK ] RegistryTest.TestReInitialize (1 ms)
[----------] 3 tests from RegistryTest (3 ms total)

[----------] 3 tests from UnitsTest
[ RUN      ] UnitsTest.TestGetExistingUnitMapping
[       OK ] UnitsTest.TestGetExistingUnitMapping (0 ms)
[ RUN      ] UnitsTest.TestGetNonExistentUnitMapping
[       OK ] UnitsTest.TestGetNonExistentUnitMapping (0 ms)
[ RUN      ] UnitsTest.TestGetEmptyUnitMapping
[       OK ] UnitsTest.TestGetEmptyUnitMapping (0 ms)
[----------] 3 tests from UnitsTest (0 ms total)

[----------] 10 tests from YAMLConfigTest
[ RUN      ] YAMLConfigTest.TestLoadValidConfigFrontendContext
[       OK ] YAMLConfigTest.TestLoadValidConfigFrontendContext (1 ms)
[ RUN      ] YAMLConfigTest.TestLoadValidConfigBackendContext
[       OK ] YAMLConfigTest.TestLoadValidConfigBackendContext (1 ms)
[ RUN      ] YAMLConfigTest.TestLoadMinimalValidConfig
[       OK ] YAMLConfigTest.TestLoadMinimalValidConfig (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateDimension
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateDimension (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithEmptyDimension
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithEmptyDimension (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithUndefinedDimension
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithUndefinedDimension (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateMetricDimension
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateMetricDimension (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithMissingMetricName
E0124 23:58:35.427593 279102 yaml_config.cc:203] Configuration for a metric must consist of "name", "unit" and "dimensions"
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithMissingMetricName (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithEmptyMetricName
E0124 23:58:35.427947 279102 yaml_config.cc:215] Configuration for a metric must consist of a non-empty "name"
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithEmptyMetricName (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateMetricName
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateMetricName (0 ms)
[----------] 10 tests from YAMLConfigTest (5 ms total)

[----------] 1 test from ManifestTest
[ RUN      ] ManifestTest.TestInitialize
[       OK ] ManifestTest.TestInitialize (0 ms)
[----------] 1 test from ManifestTest (0 ms total)

[----------] Global test environment tear-down
[==========] 45 tests from 11 test suites ran. (1992 ms total)
[  PASSED  ] 45 tests.

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

…rsion errors. Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Custom preprocess implementation Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Free memory only after the inference is done Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Implement Postprocess Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Setting Fast compiler option Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Reading checkpoint path and tokenizer path from config file using folly Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Removing run.c from cmake Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Replace auto with appropriate data type Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Using smartpointers and initializing the vector with appropriate size upfront Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Using smartpointers Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Directly converting the tensor values to prompt token ids Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Moving run.c and common variables to .cc file Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Moving run.c to a separate folder Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Uncommenting the original run.c main method Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Implemented destructor to free up resources Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Supporting files for unit test Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Processing all the batch inputs Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Setting InferenceMode guard Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Updating InferenceMode to use torch::InferenceMode Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Updating class name to BabyLlamaHandler Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Renaming llm_handler target to babyllama_handler Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Adding dummy pt file Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Typo Fix Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Calculate tokens/per second for batch input Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Adding README.md for babyllama example Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Fixing out-of-bound mem access in babyllama example Move model instance out of ts_backend Use shared_ptr<void> for model to detangle from torchscript Move BaseHAndler to backends/handler Move model instance into core Remove Torchscript as a backend and implement it as a handler Move torchscript test out of backend folder Remove dummy.pt in babyllama + update README + mvoe babyllama test to new examples/examples_test.cc file

cpp/README.md

chauhang · 2024-01-25T05:04:18Z

cpp/src/backends/core/backend.cc

+  const std::string &handler_str = manifest_->GetModel().handler;
+  std::size_t delimiter_pos = handler_str.find(manifest_->kHandler_Delimiter);
+  if (delimiter_pos != std::string::npos) {
+#ifdef __APPLE__


Will this require separate packaging for TorchServe Mac installables vs Linux version?

We're currently not planning to provide precompiled binaries but will rely on the build.sh script for installation. If we change this in the future these macros will be resolved by the preprocessor during compilation and we would require different packages for the different platforms.

We can handle this as a separate PR, filed issue #2908 for tracking

chauhang

@mreso Thanks for this PR and the enhancements. For Babyllama do we still need to use torchscripted option?

Please see few minor comments inline.

mreso · 2024-01-25T05:20:06Z

@chauhang the babyllama example uses https://github.com/karpathy/llama2.c for the model execution and does not utilize torchscript.

mreso force-pushed the feature/cpp_baby_llama_rework branch from 3064301 to f0bfaf4 Compare January 24, 2024 23:19

fix spell check

fcc6a29

mreso marked this pull request as ready for review January 24, 2024 23:59

mreso requested review from chauhang and lxning January 24, 2024 23:59

Merge branch 'master' into feature/cpp_baby_llama_rework

27b0f52

mreso mentioned this pull request Jan 25, 2024

BabyLlama with CPP backend #2544

Closed

10 tasks

chauhang reviewed Jan 25, 2024

View reviewed changes

cpp/README.md Show resolved Hide resolved

chauhang reviewed Jan 25, 2024

View reviewed changes

chauhang suggested changes Jan 25, 2024

View reviewed changes

mreso added 5 commits January 25, 2024 22:59

Move cpp babyllama example to main example folder

858d3d0

Add last successful location to error message in handle function

4e0aa86

Fix babyllama batching by changing input/output from tensor to IValue

d7fb3e9

rename prompt file

2eb5ee1

Fix spellcheck

13ab710

mreso requested a review from chauhang January 26, 2024 01:28

chauhang approved these changes Jan 26, 2024

View reviewed changes

chauhang added the c++ label Jan 26, 2024

mreso added this pull request to the merge queue Jan 26, 2024

Merged via the queue into master with commit 3ecaf0b Jan 26, 2024
13 checks passed

chauhang added this to the v0.10.0 milestone Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/cpp baby llama rework #2903

Feature/cpp baby llama rework #2903

mreso commented Jan 24, 2024 •

edited

Loading

chauhang Jan 25, 2024

mreso Jan 25, 2024

chauhang Jan 26, 2024

chauhang left a comment

mreso commented Jan 25, 2024

Feature/cpp baby llama rework #2903

Feature/cpp baby llama rework #2903

Conversation

mreso commented Jan 24, 2024 • edited Loading

Description

Type of change

Feature/Issue validation/testing

Checklist:

chauhang Jan 25, 2024

Choose a reason for hiding this comment

mreso Jan 25, 2024

Choose a reason for hiding this comment

chauhang Jan 26, 2024

Choose a reason for hiding this comment

chauhang left a comment

Choose a reason for hiding this comment

mreso commented Jan 25, 2024

mreso commented Jan 24, 2024 •

edited

Loading