-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: Add "custom"
chat template that uses input_prefix
and input_suffix
#10425
base: master
Are you sure you want to change the base?
Changes from all commits
53e0215
84da80c
9b58edf
dbe531e
b3e343e
fc05038
ec6212e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -300,8 +300,9 @@ static llama_tokens format_infill( | |||||||
} | ||||||||
|
||||||||
// Format given chat. If tmpl is empty, we take the template from model metadata | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be nice if we can leave a comment on how and when prefix/suffix is used |
||||||||
inline std::string format_chat(const struct llama_model * model, const std::string & tmpl, const std::vector<json> & messages) { | ||||||||
inline std::string format_chat(const struct llama_model * model, const std::string & tmpl, const std::string & prefix, const std::string & suffix, const std::vector<json> & messages) { | ||||||||
std::vector<common_chat_msg> chat; | ||||||||
std::string formatted_chat; | ||||||||
|
||||||||
for (size_t i = 0; i < messages.size(); ++i) { | ||||||||
const auto & curr_msg = messages[i]; | ||||||||
|
@@ -325,10 +326,16 @@ inline std::string format_chat(const struct llama_model * model, const std::stri | |||||||
throw std::runtime_error("Missing 'content' (ref: https://github.com/ggerganov/llama.cpp/issues/8367)"); | ||||||||
} | ||||||||
|
||||||||
chat.push_back({role, content}); | ||||||||
if (tmpl == "custom") { | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can do this too?
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure about this - using "custom" as a defined template name helps with selection in UI too. Is there a reason against it that I'm missing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The UI logic and underlay logic should be decoupled. If you have |
||||||||
// simple format using prefix and suffix | ||||||||
if (role == "user") formatted_chat += prefix + content + suffix; | ||||||||
else formatted_chat += content; | ||||||||
} else { | ||||||||
chat.push_back({role, content}); | ||||||||
} | ||||||||
} | ||||||||
|
||||||||
const auto formatted_chat = common_chat_apply_template(model, tmpl, chat, true); | ||||||||
if (tmpl != "custom") formatted_chat = common_chat_apply_template(model, tmpl, chat, true); | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then:
Suggested change
|
||||||||
LOG_DBG("formatted_chat: '%s'\n", formatted_chat.c_str()); | ||||||||
|
||||||||
return formatted_chat; | ||||||||
|
@@ -597,13 +604,15 @@ static bool server_sent_event(httplib::DataSink & sink, const char * event, cons | |||||||
static json oaicompat_completion_params_parse( | ||||||||
const struct llama_model * model, | ||||||||
const json & body, /* openai api json semantics */ | ||||||||
const std::string & chat_template) { | ||||||||
const std::string & chat_template, | ||||||||
const std::string & input_prefix, | ||||||||
const std::string & input_suffix) { | ||||||||
json llama_params; | ||||||||
|
||||||||
llama_params["__oaicompat"] = true; | ||||||||
|
||||||||
// Apply chat template to the list of messages | ||||||||
llama_params["prompt"] = format_chat(model, chat_template, body.at("messages")); | ||||||||
llama_params["prompt"] = format_chat(model, chat_template, input_prefix, input_suffix, body.at("messages")); | ||||||||
|
||||||||
// Handle "stop" field | ||||||||
if (body.contains("stop") && body.at("stop").is_string()) { | ||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we can refactor a bit:
chat_template
,suffix/prefix
are set, we throw an error says "only template or suffix/prefix can be set, but not both"chat_template
is set, we use itsuffix/prefix
are set, we use it and discardchat_template
In either cases, it would be nice to output a test formatted chat here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that prioritizing chat template and ignoring prefix/suffix is enough - and we can replicate the same in UI by hiding or dimming prefix/suffix input fields if "custom" is not selected.
Again, I'm not sure about leaving this in a weird noname state and only using prefix/suffix to define that custom template is used. Clearly stating what it is would be better, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the UI, I think we can go with something simple for now: I'd suggest simply have a checkbox says "use custom chat template" and if it's checked, we show prefix/suffix input. No dropdown is needed.
Let's skip the ability to select named templates like "llama3", "mistral", etc for now, because it's difficult to maintain 2 lists of template, one in
llama.cpp
and another inindex.html