-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/cpp baby llama rework #2903
Conversation
…rsion errors. Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Custom preprocess implementation Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Free memory only after the inference is done Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Implement Postprocess Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Setting Fast compiler option Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Reading checkpoint path and tokenizer path from config file using folly Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Removing run.c from cmake Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Replace auto with appropriate data type Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Using smartpointers and initializing the vector with appropriate size upfront Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Using smartpointers Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Directly converting the tensor values to prompt token ids Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Moving run.c and common variables to .cc file Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Moving run.c to a separate folder Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Uncommenting the original run.c main method Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Implemented destructor to free up resources Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Supporting files for unit test Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Processing all the batch inputs Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Setting InferenceMode guard Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Updating InferenceMode to use torch::InferenceMode Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Updating class name to BabyLlamaHandler Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Renaming llm_handler target to babyllama_handler Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Adding dummy pt file Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Typo Fix Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Calculate tokens/per second for batch input Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Adding README.md for babyllama example Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Fixing out-of-bound mem access in babyllama example Move model instance out of ts_backend Use shared_ptr<void> for model to detangle from torchscript Move BaseHAndler to backends/handler Move model instance into core Remove Torchscript as a backend and implement it as a handler Move torchscript test out of backend folder Remove dummy.pt in babyllama + update README + mvoe babyllama test to new examples/examples_test.cc file
3064301
to
f0bfaf4
Compare
const std::string &handler_str = manifest_->GetModel().handler; | ||
std::size_t delimiter_pos = handler_str.find(manifest_->kHandler_Delimiter); | ||
if (delimiter_pos != std::string::npos) { | ||
#ifdef __APPLE__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this require separate packaging for TorchServe Mac installables vs Linux version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're currently not planning to provide precompiled binaries but will rely on the build.sh script for installation. If we change this in the future these macros will be resolved by the preprocessor during compilation and we would require different packages for the different platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can handle this as a separate PR, filed issue #2908 for tracking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mreso Thanks for this PR and the enhancements. For Babyllama do we still need to use torchscripted option?
Please see few minor comments inline.
@chauhang the babyllama example uses https://github.com/karpathy/llama2.c for the model execution and does not utilize torchscript. |
Description
This PR is a rebase of #2544 which add a baby llama example to the cpp backend.
Additionally, it removes the framework specific backends like the TorchScriptBackend.
With this PR no custom backend for different frameworks like llama.cpp, vllm, TorchScript will be necessary.
Instead, the handler .so file can be linked against any framework that suites the current use case.
Fixes #(issue)
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Logs for Test A:
Checklist: