[Bug]: reflection_with_llm issue with local server #2492

MarianoMolina · 2024-04-23T23:37:58Z

Describe the bug

There seems to be an issue with generating the reflection_with_llm summary when working with locally deployed models.

Below is a simple snippet that when run with my local model, it generates the conversation correctly, but the ChatResult summary is empty. When I run it using gpt4, it generates the summary correctly.

Steps to reproduce

from autogen import GroupChatManager, GroupChat, config_list_from_json, ConversableAgent, UserProxyAgent

config_list = config_list_from_json(
env_or_file="OAI_CONFIG_LIST",
file_location=".",
filter_dict={
"model": ["TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf"],
},
)
llm_config = {
"cache_seed": False, # change the cache_seed for different trials
"temperature": 0,
"config_list": config_list,
"timeout": 120,
}

user_proxy_auto = UserProxyAgent(
name="user_proxy_auto",
code_execution_config=False,
llm_config=llm_config
)

drafter_agent = ConversableAgent(
name="drafter",
llm_config=llm_config,
system_message="You are an assistant in charge of drafting the answer for the task.",
)
reviewer_agent = ConversableAgent(
name="reviewer",
llm_config=llm_config,
system_message="You are an assistant in charge of reviewing the drafted answer and assess its quality in terms of tackling the task successfully and effectively. You can make adjustments directly, request a completely new draft while providing a framework to approach the task more effectively, or approve the answer as is. If the task is complete, end the task with TERMINATE",
)
group_chat = GroupChat(
agents=[drafter_agent, reviewer_agent, user_proxy_auto],
messages=[],
max_round=4,
speaker_selection_method="round_robin"
)

chat_manager = GroupChatManager(
groupchat=group_chat,
llm_config=llm_config
)

chat_result = user_proxy_auto.initiate_chat(
chat_manager,
message="List 5 roles/positions that benefit strongly from a high EQ, and list your reasons.",
clear_history=True,
summary_args={"summary_prompt": "List the final answer to the task."},
summary_method="reflection_with_llm"
)

print(f'chat_result history: {chat_result.chat_history}')
print(f'chat_result summary: {chat_result.summary}')

Model Used

TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf -> LMStudio
gpt-4-turbo-2024-04-09 -> OpenAI

Expected Behavior

reflection_with_llm not generating the output when using local model

Screenshots and logs

No response

Additional Information

Name: pyautogen
Version: 0.2.23

The text was updated successfully, but these errors were encountered:

MarianoMolina · 2024-04-24T15:35:35Z

Did a run and added some comments to ConversableAgent to try to track down the issue and I still can't figure out what it is:

List 5 roles/positions that benefit strongly from a high EQ, and list your reasons.

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
1. generate_oai_reply for drafter to chat_manager. config: None
2. _generate_oai_reply_from_client for drafter. llm_client: [{'cache_seed': False, 'temperature': 0.5, 'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf'}], cache: None
2.b. _generate_oai_reply_from_client messages: [{'content': '[SYSTEM MESSAGE]"', 'role': 'system'}, {'content': '[PROMPT]', 'name': 'user_proxy_auto', 'role': 'user'}]
3. _generate_oai_reply_from_client response: ChatCompletion(id='chatcmpl-9jjt8tzao65chsce8vo6n', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="[REPLY 1]", role='assistant', function_call=None, tool_calls=None))], created=1713971689, model='TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=340, prompt_tokens=340, total_tokens=680), cost=0, message_retrieval_function=<bound method OpenAIClient.message_retrieval of <autogen.oai.client.OpenAIClient object at 0x00000253849CA350>>, config_id=0, pass_filter=True)
drafter (to chat_manager):
[REPLY 1]

--------------------------------------------------------------------------------
1. Calling reflection_with_llm_as_summary from user_proxy_auto to chat_manager
1.b. llm_config from sender: {'cache_seed': False, 'temperature': 0.5, 'config_list': [{'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', 'base_url': 'http://localhost:1234/v1', 'api_key': 'lm-studio'}], 'timeout': 120} and from recipient: {'cache_seed': False, 'temperature': 0.5, 'config_list': [{'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', 'base_url': 'http://localhost:1234/v1', 'api_key': 'lm-studio'}], 'timeout': 120}
2. _generate_oai_reply_from_client for user_proxy_auto. llm_client: [{'cache_seed': False, 'temperature': 0.5, 'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf'}], cache: None
2.b. _generate_oai_reply_from_client messages: [{'content': '[PROMPT]', 'role': 'user', 'name': 'user_proxy_auto'}, {'content': "[REPLY 1]", 'role': 'user', 'name': 'drafter'}, {'role': 'system', 'content': 'List the final answer to the task.'}]
3. _generate_oai_reply_from_client response: ChatCompletion(id='chatcmpl-iv8jxgvqcydrcavkk0yrm', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='', role='assistant', function_call=None, tool_calls=None))], created=1713971727, model='TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=0, prompt_tokens=1, total_tokens=1), cost=0, message_retrieval_function=<bound method OpenAIClient.message_retrieval of <autogen.oai.client.OpenAIClient object at 0x00000253849F1780>>, config_id=0, pass_filter=True)
response:
chat_result history: [{'content': '[PROMPT]', 'role': 'assistant'}, {'content': "[REPLY 1]", 'name': 'drafter', 'role': 'user'}]
chat_result summary:

From what I see, despite the flows (reflection_with_llm and normal generate_reply) being distinct, I cannot see why _generate_oai_reply_from_client's response is empty with reflection_with_llm.

There's clearly an issue since the reflection flow _generate_oai_reply_from_client response object shows prompt tokens as 1, but the messages object being passed seems correct.

ekzhu · 2024-04-25T05:24:25Z

Have you checked the logs on your local model server?

MarianoMolina · 2024-04-25T17:24:47Z

[2024-04-24 22:54:11.733] [INFO] Received POST request to /v1/chat/completions with body: {
"messages": [
{
"content": "[SYSTEM MESSAGE].",
"role": "system"
},
{
"content": "[PROMPT]",
"role": "user",
"name": "user_proxy_auto"
},
{
"content": "[RESPONSE 1]",
"role": "user",
"name": "hr_expert_drafter"
},
{
"role": "system",
"content": "[PROMPT 2]"
}
],
"model": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf",
"stream": false
}
[2024-04-24 22:54:11.733] [INFO] [LM STUDIO SERVER] Context Overflow Policy is: Truncate Middle
[2024-04-24 22:54:11.735] [INFO] [LM STUDIO SERVER] Last message: { role: 'system', content: '[PROMPT 2]' } (total messages = 4)
[2024-04-24 22:54:12.500] [INFO] [LM STUDIO SERVER] [TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf] Generated prediction: {
"id": "chatcmpl-t6yiyywzta91zv0ozsr9r",
"object": "chat.completion",
"created": 1714010051,
"model": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": ""
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 1,
"completion_tokens": 0,
"total_tokens": 1
}
}

MarianoMolina · 2024-04-25T17:46:57Z

Ok, I figured it out.
Seems Mistral Instruct returns an empty completion when there's a second "system" message in the middle of the flow. Seems to be an issue with compatibility, which is something I should be aware of when using local models with Autogen, as stated in many places.
After repeating the process with this change, the response now seems correct.
With this in mind, it might be a good idea to allow the role of the reflection_with_llm prompt to be defined, since right now its hardcoded here:

    def _reflection_with_llm(
        self, prompt, messages, llm_agent: Optional[Agent] = None, cache: Optional[AbstractCache] = None
    ) -> str:
        """Get a chat summary using reflection with an llm client based on the conversation history.

        Args:
            prompt (str): The prompt (in this method it is used as system prompt) used to get the summary.
            messages (list): The messages generated as part of a chat conversation.
            llm_agent: the agent with an llm client.
            cache (AbstractCache or None): the cache client to be used for this conversation.
        """
        system_msg = [
            {
                "role": "system", -> hardcoded
                "content": prompt, -> Programatically defined
            }
        ]

        messages = messages + system_msg
        if llm_agent and llm_agent.client is not None:
            llm_client = llm_agent.client
        elif self.client is not None:
            llm_client = self.client
        else:
            raise ValueError("No OpenAIWrapper client is found.")
        response = self._generate_oai_reply_from_client(llm_client=llm_client, messages=messages, cache=cache)
        print("response: ", response)
        return response

It could be as simple as allowing the prompt argument to be an Union[str, dict] that gets deployed accordingly in _reflection_with_llm. Wouldn't affect behavior of current implementations, and allow this minor issue to be solved.

I can do a PR with this if that's ok?

ekzhu · 2024-04-25T18:56:01Z

@MarianoMolina perfect. Yes a PR on this option sounds great! It can be added to summary_args parameter.

MarianoMolina · 2024-04-26T22:05:15Z

@ekzhu #2527
Done

MarianoMolina added the bug label Apr 23, 2024

ekzhu added the models Pertains to using alternate, non-GPT, models (e.g., local models, llama, etc.) label Apr 25, 2024

MarianoMolina mentioned this issue Apr 26, 2024

Add role to reflection with llm #2527

Merged

2 tasks

MarianoMolina closed this as completed Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: reflection_with_llm issue with local server #2492

[Bug]: reflection_with_llm issue with local server #2492

MarianoMolina commented Apr 23, 2024

MarianoMolina commented Apr 24, 2024

ekzhu commented Apr 25, 2024

MarianoMolina commented Apr 25, 2024

MarianoMolina commented Apr 25, 2024

ekzhu commented Apr 25, 2024

MarianoMolina commented Apr 26, 2024

[Bug]: reflection_with_llm issue with local server #2492

[Bug]: reflection_with_llm issue with local server #2492

Comments

MarianoMolina commented Apr 23, 2024

Describe the bug

Steps to reproduce

Model Used

Expected Behavior

Screenshots and logs

Additional Information

MarianoMolina commented Apr 24, 2024

ekzhu commented Apr 25, 2024

MarianoMolina commented Apr 25, 2024

MarianoMolina commented Apr 25, 2024

ekzhu commented Apr 25, 2024

MarianoMolina commented Apr 26, 2024