Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for chat templates #3810

Closed
4 tasks done
Tostino opened this issue Oct 27, 2023 · 11 comments
Closed
4 tasks done

Support for chat templates #3810

Tostino opened this issue Oct 27, 2023 · 11 comments

Comments

@Tostino
Copy link

Tostino commented Oct 27, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

Add support for HF chat templates: https://huggingface.co/docs/transformers/chat_templating

Motivation

Provide a safe and consistent way to interact with the model through an OpenAI style API using the integrated jinja2 template in the tokenizer.
Ensures that consistency is maintained from training to inference for optimal model performance.

@Tostino
Copy link
Author

Tostino commented Oct 28, 2023

Just wanted to point to the pull request that I created for vLLM to support chat templates: vllm-project/vllm#1493

@Dampfinchen
Copy link

Dampfinchen commented Oct 31, 2023

This needs to be added to the converter.py. I believe @KerfuffleV2 works on the converter.

@KerfuffleV2
Copy link
Collaborator

I'm just a random person that contributed some stuff (including to the converter). Looking at this, I don't think we can really use those templates. It appears to just directly embed JavaScript: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json

Embedding something like that instruction format in GGUF files isn't a bad idea, but it would have to be something that doesn't include raw code like that.

@Tostino
Copy link
Author

Tostino commented Nov 1, 2023

@KerfuffleV2 So the templates are not plain JavaScript. They are jinja templates. It is a subset of functionality intended to provide a safe-ish template language.

So there are limits on what they are able to do as far as code execution, though there have been certain vulnerabilities related to injection attacks and returning data stored in global variables I've read about previously.

I do appreciate you bringing this up though, it can be discussed and we can determine if this is in fact a safe way to go.

HF is trying to get this standardized, and if this isn't a safe approach, it should be stopped now rather than later.

@KerfuffleV2
Copy link
Collaborator

They are jinja templates.

Fair enough, I stand corrected with the "raw code" comment.

HF is trying to get this standardized, and if this isn't a safe approach, it should be stopped now rather than later.

I wasn't really talking about safety. The problem is it basically could only be used for the server example, right? Since it pretty much depends on a JavaScript interpreter. It's not just a general template that can be used for instructing the model. That being the case, I personally wouldn't really include it in the file format. If something like the server example wants to read/use those templates then that would be different.

Note: I don't speak for the project, so this is just my personal opinion about it.

@Tostino
Copy link
Author

Tostino commented Nov 1, 2023

Yes, it is intended for use with a "chat" endpoint. Completion endpoints don't use it (at least currently)

But that doesn't mean it has to be only the server example that implements that functionality. You could have cpp methods that take in the expected transport format for the chat endpoints pretty easily.

If there is no will to add the template to the gguf format itself, I have at least added support for a --chat-template flag that can accept a separate file containing the desired template to vLLM. Similar could be done here.

@KerfuffleV2
Copy link
Collaborator

But that doesn't mean it has to be only the server example that implements that functionality.

As far as I know, the server example is the only thing that currently deals with web stuff. That's what I was talking about. For example, you couldn't use this type of chat template with the main example in interactive mode.

I'm probably the wrong person to look at to gauge general interest in this, I don't even use the server and pretty much only interact with llama.cpp through the commandline stuff. (Actually, I don't even use interactive mode the majority of the time.)

As far as I know, the GGUF format supports adding arbitrary keys/values so it probably wouldn't be hard to add support.

From the other pull request you linked, you seem to know Python. Why not submit a pull request to just pull in that template and add it to a key in the GGUF file? See what kind of comments/etc you get. If you need to ask questions about how convert.py or the GGUF Python stuff works, feel free to ask me. I'll help if I can, though I'm not an expert on this stuff.

@Tostino
Copy link
Author

Tostino commented Nov 1, 2023

I could take a stab at it when I have some free time (which is likely a month or so out).

If others have time before I get to it, feel free. I'll mention here if I start working on it.

On another note, there seems to be a pretty full implementation of jinja templates for cpp: https://github.com/jinja2cpp/Jinja2Cpp

@slaren
Copy link
Collaborator

slaren commented Nov 1, 2023

I think it would be good to add the jinja templates as metadata to GGUF as soon as possible, it would be a minimal change and it would allow frontends that support jinja to use them. If we don't do this, using GGUF models will be harder than HF models because users will be forced to enter the chat template manually. We can add support for it in our examples later.

@Dampfinchen
Copy link

I think it would be good to add the jinja templates as metadata to GGUF as soon as possible, it would be a minimal change and it would allow frontends that support jinja to use them. If we don't do this, using GGUF models will be harder than HF models because users will be forced to enter the chat template manually. We can add support for it in our examples later.

It would also help llama.cpp as a standalone tremendously, as it's not needed anymore to set the prompt template in a complicated manner, but it just can use that metadata to set the prompt template automatically.

@Tostino
Copy link
Author

Tostino commented Nov 29, 2023

This is being discussed in #4216 so I am going to close this.

@Tostino Tostino closed this as completed Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants