-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama.cpp example for cpp backend #2904
Conversation
Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Updating llm handler - loadmodel, preprocess, inference methods Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Fixed infinite lock by adding request ids to the preprocess method Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Adding test script for finding tokens per second llama-7b-chat and ggml version Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> GGUF Compatibility Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Fixing unit tests Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Fix typo Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Using folly to read config path Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Removing debug couts Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Processing all the items in the batch Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Adopted llama.cpp api changes
a59e85c
to
9a9b69e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mreso Thanks for the updates. Few comments:
-
The directory structure is bit complex for the examples - as the examples handles files are under the cpp/src/examples folder and the usage prompts etc in the top level examples/cpp/xxx folders. Also there is only one CMake file for all the examples. This is ok for the initial code merge, but will require some cleanup to make it easier for use. Are you planning to change that in the subsequent PR?
-
Under the
cpp/test/resources/examples/
folder there are multiple .pt files. Can you please clarify what these files are for and how they are getting generated? Will be good to include the actual source file / script for generating these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need redefine LSP later.
const int max_tokens_list_size = max_context_size - 4; | ||
|
||
if ((int)tokens_list.size() > max_tokens_list_size) { | ||
std::cout << __func__ << ": error: prompt too long (" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change to log?
|
||
if (llama_eval(llama_ctx, tokens_list.data(), int(tokens_list.size()), | ||
n_past)) { | ||
std::cout << "Failed to eval\n" << __func__ << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
std::cout << "New Token: " | ||
<< llama_token_to_piece(llama_ctx, new_token_id) << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need log each output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary, its an example so I though about keeping it, but can log as debug instead
} | ||
|
||
std::string generated_text_str = generated_text_stream.str(); | ||
std::cout << "Generated Text Str: " << generated_text_str << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Thanks @chauhang! To 1: I was actually under the impression that source files need to be under the root cmake folder but turns out that you can add out-of-tree sources. Currently, moving the examples over and split up into separate CMake files. To 2: these are part of the unit tests for the TorchScript model. They probably contain this mlp I can check and add a py script to recreate the pt file in a next pr. Same with the pt file for the data. Better to show a way to upload actual image data instead. Will create an issue to track these. |
Issue for artifacts recreation: #2909 |
Description
This PR replaces #2527 which adds a llama.cpp example to the cpp backend. It contains all adjustments to the removed TochScriptBackend.
Fixes #(issue)
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Logs for Test A
Checklist: