Support v1/chat/completions #50

comaniac · 2024-01-19T01:07:28Z

Close #26

Support OpenAI compatible chat API with and without streaming.
Support builtin and dynamic registered chat templates.

merrymercy · 2024-01-19T01:18:11Z

For the chat template, can we use the hugging face tokenizer by default? https://huggingface.co/docs/transformers/main/en/chat_templating#how-do-i-use-chat-templates

comaniac · 2024-01-19T01:19:56Z

For the chat template, can we use the hugging face tokenizer by default? https://huggingface.co/docs/transformers/main/en/chat_templating#how-do-i-use-chat-templates

Good idea. Let me add this.

comaniac · 2024-01-19T01:43:07Z

Now if --chat-template is not specified, we use the tokenizer build-in chat template. Thus, most users should not worry about the chat template for HF models. Meanwhile, I found that some tokenizers such as TinyLlama does not have chat template, so we will get the following when calling apply_chat_template(messages, tokenize=False, add_generation_prompt=True):

No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model
, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.

'<s>[INST] <<SYS>>\nYou are a helpful AI assistant\n<</SYS>>\n\nList 3 countries and their capitals. [/INST]'

And this template is actually incorrect for this model, so you will get the following response in the unit test:

 <<SYS>>
You are a helpful AI assistant
<</SYS>>

List 4 cities with more than 100,000 people. [/INST] <<SYS>>
You are a helpful AI assistant
<</SYS>>

List 5

The following response is expected with ChatML template:

Here are three countries and their capitals:

1. Canada - Ottawa
2. Australia - Canberra
3. South Korea - Seoul

Another issue is apply_chat_template doesn't provide stop strings, so the response may be undesired. For example, Llama-2 template should have stop string "[INST]", "[/INST]", "<<SYS>>", "<</SYS>>". In general, I would still suggest specify chat template explicitly, but maybe we could put this issue in the troubleshooting.

merrymercy · 2024-01-19T07:41:21Z

python/sglang/srt/conversation.py

@@ -0,0 +1,381 @@
+# Adapted from
+# https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py


We can consider importing instead of copying later.

python/sglang/srt/server_args.py

Support v1/chat/completions

a0fbb66

comaniac requested a review from merrymercy January 19, 2024 01:07

comaniac added 2 commits January 19, 2024 01:08

format

c2eb138

typo

2d7077e

comaniac added 2 commits January 19, 2024 01:35

address comment

850a3a1

README

dc8df5d

merrymercy approved these changes Jan 19, 2024

View reviewed changes

Update python/sglang/srt/server_args.py

350fb35

merrymercy merged commit 23471f9 into main Jan 19, 2024

merrymercy deleted the cody/openai-chat branch January 19, 2024 07:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support v1/chat/completions #50

Support v1/chat/completions #50

comaniac commented Jan 19, 2024

merrymercy commented Jan 19, 2024

comaniac commented Jan 19, 2024

comaniac commented Jan 19, 2024 •

edited

Loading

merrymercy Jan 19, 2024

		@@ -0,0 +1,381 @@
		# Adapted from
		# https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py

Support v1/chat/completions #50

Support v1/chat/completions #50

Conversation

comaniac commented Jan 19, 2024

merrymercy commented Jan 19, 2024

comaniac commented Jan 19, 2024

comaniac commented Jan 19, 2024 • edited Loading

merrymercy Jan 19, 2024

Choose a reason for hiding this comment

comaniac commented Jan 19, 2024 •

edited

Loading