Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: Add "custom" chat template that uses input_prefix and input_suffix #10425

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

MaggotHATE
Copy link
Contributor

@MaggotHATE MaggotHATE commented Nov 20, 2024

This PR adds a new simple chat template that makes use of input_prefix and input_suffix to format messages. This is enough to be able to work with custom templates on server and is a more inline solution as opposed to changing .js parts of server (but also a less flexible one).

Currently, input_prefix and input_suffix are not used in server at all, and main uses both in its own way. As such, we can reuse prefix and suffix for server only without extra steps. It would also make adding template customization into UI quite easy (will add a bit later in this PR).

What this currently does is:

  1. Adds "custom" template definition into server, automatically selected if no template is specified, but input_prefix and input_suffix are defined.
  2. Passes input_prefix and input_suffix in format_chat (utils.hpp used by server), where they are used to simply format messages from json payload.

It is worth mentioning that main doesn't always apply prefix/suffix correctly (for example, with "chatml" format), so maybe it is worth adding prefix/suffix manually into each existing prompt format. I can do that later in a separate PR. @ngxson @slaren

* llama, common, server
* main handles prefix and suffix separately, only adjust for example display
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a quite simple feature to add, so I prefer having less code for it. Also it's better if we can make it more isolated. For example, we could only implement this logic in server.cpp instead of common.cpp

src/llama.cpp Outdated
@@ -22107,6 +22119,8 @@ static int32_t llama_chat_apply_template_internal(
int32_t llama_chat_apply_template(
const struct llama_model * model,
const char * tmpl,
const char * prefix,
const char * suffix,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to change llama_chat_apply_template. This feature can be implemented outside of llama.h library

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's better to implement it in utils.hpp then? A separate formatting function that is then used in format_chat function? Sorry for questions, I'm not quite familiar with server's structure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you can add it in server/utils.hpp, ideally as a branch inside format_chat (or can split into its own function if needed)

Copy link
Contributor Author

@MaggotHATE MaggotHATE Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, already did that. I will see if all checks are good and add it into UI.

Looks like it will have to be a list of chat templates, in the end - with a choice of "custom" as this new simple solution.

common/common.h Outdated Show resolved Hide resolved
common/common.h Outdated Show resolved Hide resolved
@MaggotHATE MaggotHATE changed the title llama, common, server: Add "custom" chat template that uses input_prefix and input_suffix server: Add "custom" chat template that uses input_prefix and input_suffix Nov 20, 2024
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw I think you may need to add kinda prompt_prefix too. A template like this may not be possible without using a "global" prefix:

You are a helpful assistant

### Question:...

### Response:...

@@ -325,10 +326,16 @@ inline std::string format_chat(const struct llama_model * model, const std::stri
throw std::runtime_error("Missing 'content' (ref: https://github.com/ggerganov/llama.cpp/issues/8367)");
}

chat.push_back({role, content});
if (tmpl == "custom") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do this too?

Suggested change
if (tmpl == "custom") {
bool is_custom = !prefix.empty() || !suffix.empty();
if (is_custom) {

}

const auto formatted_chat = common_chat_apply_template(model, tmpl, chat, true);
if (tmpl != "custom") formatted_chat = common_chat_apply_template(model, tmpl, chat, true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then:

Suggested change
if (tmpl != "custom") formatted_chat = common_chat_apply_template(model, tmpl, chat, true);
if (!is_custom) formatted_chat = common_chat_apply_template(model, tmpl, chat, true);

@@ -300,8 +300,9 @@ static llama_tokens format_infill(
}

// Format given chat. If tmpl is empty, we take the template from model metadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if we can leave a comment on how and when prefix/suffix is used

@@ -3220,15 +3220,20 @@ int main(int argc, char ** argv) {

// if a custom chat template is not supplied, we will use the one that comes with the model (if any)
if (params.chat_template.empty()) {
if (!ctx_server.validate_model_chat_template()) {
if (!params.input_prefix.empty() || !params.input_suffix.empty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we can refactor a bit:

  • if both chat_template, suffix/prefix are set, we throw an error says "only template or suffix/prefix can be set, but not both"
  • if chat_template is set, we use it
  • if suffix/prefix are set, we use it and discard chat_template
  • if none is specified, we use model's default template

In either cases, it would be nice to output a test formatted chat here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants