-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Anthropic-style OpenAI-API extension for specifying prefix of assistant's response #2161
Comments
Note that OpenRouter also supports specifying a prefix for the assistant response via the same method, so I think that this feature is standardized enough that it's safe to add it.
await fetch("https://openrouter.ai/api/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${OPENROUTER_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
"model": "mistralai/mixtral-8x7b-instruct",
"messages": [
{"role": "user", "content": "Who are you?"},
{"role": "assistant", "content": "I'm not sure, but my best guess is"},
],
})
}); |
Hmm, it seems that this is a little more complex with VLM than with LLM. If anyone has any tips on the easiest / most minimal change to the API server required to achieve something like this, that would be great! Maybe a hacky approach that I could try is to just remove the last N tokens right before starting generation - where N corresponds to the number of special tokens in the template that come after a message is finished and before the start of the next message. |
Hi, @josephrocca
As you can see, after the loop of Here is the verifying code: from lmdeploy.model import MODELS
# I use "internlm2" chat template as an example
chat_template = MODELS.get("internlm2")()
# the messages from your issue
messages = [
{"role": "user", "content": "What's the Greek name for Sun? (A) Sol (B) Helios (C) Sun"},
{"role": "assistant", "content": "The best answer is ("},
]
result = chat_template.messages2prompt(messages)
print(result) The result is:
See there are two consecutive "<|im_start|>assistant" in the prompt, which is illegal |
@AllentDan I think we can support the specifying prefix feature |
Motivation
It's often useful to specify a prefix to an LLM's response to help get it on the right track (e.g. specifying prefix as
{
to start JSON so it doesn't reply "Sure, I can generate some JSON for you..."). It's a simple and easy form of output constraint.Anthropic gives this example in their docs:
The context for this feature request is that I was using Sonnet 3.5 for image captioning, and then wanted to try
OpenGVLab/InternVL2-Llama3-76B-AWQ
with LMDeploy. Through prompt engineering with Sonnet 3.5, I found that I achieved very significantly higher success rates by using the assistant response prefix feature. Upon testing with LMDeploy, this doesn't seems to be supported, and it seems that InternVL2 has similar failure modes to Sonnet 3.5, so I think it would also benefit significantly from the response prefix constraint.I think that I will have to find the correct chat template and manually construct the text for now, but it would be great if this were considered as a feature for the next version of LMDeploy.
Related resources
Additional context
Note that the response for this:
is this:
So the response only includes the content after the specified prefix - i.e. it does not return the given prefix as part of the response.
The text was updated successfully, but these errors were encountered: