Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Mistral Large Instruct 2407 tool calling leakage #8301

Closed
1 task done
dsingal0 opened this issue Sep 9, 2024 · 8 comments · Fixed by #8515
Closed
1 task done

[Bug]: Mistral Large Instruct 2407 tool calling leakage #8301

dsingal0 opened this issue Sep 9, 2024 · 8 comments · Fixed by #8515
Labels
bug Something isn't working

Comments

@dsingal0
Copy link

dsingal0 commented Sep 9, 2024

Your current environment

When using vllm 0.6.0, the mistral tool call parser does not work as expected for Mistral Large 2407 https://huggingface.co/mistralai/Mistral-Large-Instruct-2407 @K-Mistele

🐛 Describe the bug

It used to work fine when using Autotokenizer to instantiate the tokenizer, but not with MistralTokenizer from mistral_commons.
Basically, when you run the model and give it tools, the model thinks it is the tool.
Aka, if I give Mistral Large 2407 a tool for looking up movie information on IMDB, the model responds to "Who are you" with "I am a movie database lookup bot" instead of "I am an AI trained by Mistral AI"

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@dsingal0 dsingal0 added the bug Something isn't working label Sep 9, 2024
@K-Mistele
Copy link
Contributor

I originally built and tested tool calling with the chat template provided in the repo and AutoTokenizer on Mistral 7B Instruct v0.3 for the purposes of #5649. I confirmed it had per-token parity using AutoTokenizer with the tokenizer in mistral_common. I did not have issues with tool calls at temperature=0 which is what Mistral recommends/requires (the 7B model's tool calling behavior, and possibly the larger models' as well, is very brittle compared to e.g. Hermes -- it does NOT work well above near-zero temperatures)

Adding MistralTokenizer happened during/after the PR, so I'm not sure the behavior has been tested as thoroughly. I recently did try Mistral 7B and noticed the MistralTokenizer was used. I did have some issues with it then - the model didn't want to use tools (I actually tested it on the same thing lol - giving it a SQL tool to query IMDB; for me it would just generate the SQL in markdown instead of calling the tool). But, at temperature = 0, it does pass the test cases defined in CI although those are the most naive tool use examples possible (get_current_weather is the provided tool lol).

Is there a way to force the use of AutoTokenizer or to toggle it somehow? If so, that's probably preferable. I'm not able to debug Mistral-Large since I don't have the vram for it (I'm on a v100 32GB). I can dig into the AutoTokenizer vs. MistralTokenizer issue some more though.

@K-Mistele
Copy link
Contributor

For what it's worth, I have found that Mistral's tool calling behavior is also VERY sensitive to system prompts. Generally, I have to give it very explicit instructions about what the tools are, when it should/shouldn't use them, I have to tell it that it's an AI agent that can call tools OR generate a text response; etc.

You can see this in examples/tool_chat_template_mistral_parallel.jinja -- Mistral's function calling format supports parallel tool calling, but it doesn't work out of the box at all, even with temperature=0 and using Mistral's recommended inference code that's specified in the model card in the repository on Hugging Face (instead of vLLM). The only way I was able to get parallel tool calls generated correctly was with that chat template's system prompt. I found that generally, the model's tool calling quality is poor, even with token-level parity between vLLM and the mistral_common package's tokenizer.

@dsingal0
Copy link
Author

dsingal0 commented Sep 9, 2024

I'll try and see if I can get a token comparison between the prompt after the chat template is applied for autotokenizer and mistral_commons on mistral-large.
Do you know why MistralTokenizer doesn't do tools? vllm/transformers_utils/tokenizers/mistral.py

 def apply_chat_template(self,
                            conversation: List["ConversationMessage"],
                            tools: Optional[Dict[str, Any]] = None,
                            **kwargs) -> List[int]:
        assert tools is None, "`tools` are not yet supported."

        request = ChatCompletionRequest(
            messages=conversation)  # type: ignore[type-var]
        encoded = self.mistral.encode_chat_completion(request)

        # encode-decode to get clean prompt
        return encoded.tokens

It sounds like we need to compare AnyTokenizer to HF's autotokenizer.

@K-Mistele
Copy link
Contributor

So AnyTokenizer is a Union[PreTrainedTokenizer, PreTrainedTokenizerFast, MistralTokenizer]

The MistralTokenizer part of that was added recently

@K-Mistele
Copy link
Contributor

K-Mistele commented Sep 9, 2024

Looks like #7739 is the source of this change. Can you try using --tokenizer-mode auto? The PR indicates that this should be the default (vs. --tokenizer-mode mistral), but that may not be the case. It might be worth trying it both ways and seeing which gives better results.

@mgoin
Copy link
Member

mgoin commented Sep 9, 2024

@patrickvonplaten could you possibly help out here? It would be nice if tool calling in vLLM worked easily with the mistral tokenizer too

@patrickvonplaten
Copy link
Contributor

Thanks for flagging - I'll look into this!

@patrickvonplaten
Copy link
Contributor

PR to enable function calling for "mistral" formatted models is here: #8515 (should actually even work for Pixtral!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants