Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: reflection_with_llm issue with local server #2492

Closed
MarianoMolina opened this issue Apr 23, 2024 · 6 comments · Fixed by #2527
Closed

[Bug]: reflection_with_llm issue with local server #2492

MarianoMolina opened this issue Apr 23, 2024 · 6 comments · Fixed by #2527
Labels
models Pertains to using alternate, non-GPT, models (e.g., local models, llama, etc.)

Comments

@MarianoMolina
Copy link
Contributor

Describe the bug

There seems to be an issue with generating the reflection_with_llm summary when working with locally deployed models.

Below is a simple snippet that when run with my local model, it generates the conversation correctly, but the ChatResult summary is empty. When I run it using gpt4, it generates the summary correctly.

Steps to reproduce

from autogen import GroupChatManager, GroupChat, config_list_from_json, ConversableAgent, UserProxyAgent

config_list = config_list_from_json(
env_or_file="OAI_CONFIG_LIST",
file_location=".",
filter_dict={
"model": ["TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf"],
},
)
llm_config = {
"cache_seed": False, # change the cache_seed for different trials
"temperature": 0,
"config_list": config_list,
"timeout": 120,
}

user_proxy_auto = UserProxyAgent(
name="user_proxy_auto",
code_execution_config=False,
llm_config=llm_config
)

drafter_agent = ConversableAgent(
name="drafter",
llm_config=llm_config,
system_message="You are an assistant in charge of drafting the answer for the task.",
)
reviewer_agent = ConversableAgent(
name="reviewer",
llm_config=llm_config,
system_message="You are an assistant in charge of reviewing the drafted answer and assess its quality in terms of tackling the task successfully and effectively. You can make adjustments directly, request a completely new draft while providing a framework to approach the task more effectively, or approve the answer as is. If the task is complete, end the task with TERMINATE",
)
group_chat = GroupChat(
agents=[drafter_agent, reviewer_agent, user_proxy_auto],
messages=[],
max_round=4,
speaker_selection_method="round_robin"
)

chat_manager = GroupChatManager(
groupchat=group_chat,
llm_config=llm_config
)

chat_result = user_proxy_auto.initiate_chat(
chat_manager,
message="List 5 roles/positions that benefit strongly from a high EQ, and list your reasons.",
clear_history=True,
summary_args={"summary_prompt": "List the final answer to the task."},
summary_method="reflection_with_llm"
)

print(f'chat_result history: {chat_result.chat_history}')
print(f'chat_result summary: {chat_result.summary}')

Model Used

TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf -> LMStudio
gpt-4-turbo-2024-04-09 -> OpenAI

Expected Behavior

reflection_with_llm not generating the output when using local model

Screenshots and logs

No response

Additional Information

Name: pyautogen
Version: 0.2.23

@MarianoMolina
Copy link
Contributor Author

Did a run and added some comments to ConversableAgent to try to track down the issue and I still can't figure out what it is:

List 5 roles/positions that benefit strongly from a high EQ, and list your reasons.

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
1. generate_oai_reply for drafter to chat_manager. config: None
2. _generate_oai_reply_from_client for drafter. llm_client: [{'cache_seed': False, 'temperature': 0.5, 'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf'}], cache: None
2.b. _generate_oai_reply_from_client messages: [{'content': '[SYSTEM MESSAGE]"', 'role': 'system'}, {'content': '[PROMPT]', 'name': 'user_proxy_auto', 'role': 'user'}]
3. _generate_oai_reply_from_client response: ChatCompletion(id='chatcmpl-9jjt8tzao65chsce8vo6n', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="[REPLY 1]", role='assistant', function_call=None, tool_calls=None))], created=1713971689, model='TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=340, prompt_tokens=340, total_tokens=680), cost=0, message_retrieval_function=<bound method OpenAIClient.message_retrieval of <autogen.oai.client.OpenAIClient object at 0x00000253849CA350>>, config_id=0, pass_filter=True)
drafter (to chat_manager):
[REPLY 1]

--------------------------------------------------------------------------------
1. Calling reflection_with_llm_as_summary from user_proxy_auto to chat_manager
1.b. llm_config from sender: {'cache_seed': False, 'temperature': 0.5, 'config_list': [{'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', 'base_url': 'http://localhost:1234/v1', 'api_key': 'lm-studio'}], 'timeout': 120} and from recipient: {'cache_seed': False, 'temperature': 0.5, 'config_list': [{'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', 'base_url': 'http://localhost:1234/v1', 'api_key': 'lm-studio'}], 'timeout': 120}
2. _generate_oai_reply_from_client for user_proxy_auto. llm_client: [{'cache_seed': False, 'temperature': 0.5, 'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf'}], cache: None
2.b. _generate_oai_reply_from_client messages: [{'content': '[PROMPT]', 'role': 'user', 'name': 'user_proxy_auto'}, {'content': "[REPLY 1]", 'role': 'user', 'name': 'drafter'}, {'role': 'system', 'content': 'List the final answer to the task.'}]
3. _generate_oai_reply_from_client response: ChatCompletion(id='chatcmpl-iv8jxgvqcydrcavkk0yrm', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='', role='assistant', function_call=None, tool_calls=None))], created=1713971727, model='TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=0, prompt_tokens=1, total_tokens=1), cost=0, message_retrieval_function=<bound method OpenAIClient.message_retrieval of <autogen.oai.client.OpenAIClient object at 0x00000253849F1780>>, config_id=0, pass_filter=True)
response:
chat_result history: [{'content': '[PROMPT]', 'role': 'assistant'}, {'content': "[REPLY 1]", 'name': 'drafter', 'role': 'user'}]
chat_result summary:

From what I see, despite the flows (reflection_with_llm and normal generate_reply) being distinct, I cannot see why _generate_oai_reply_from_client's response is empty with reflection_with_llm.

There's clearly an issue since the reflection flow _generate_oai_reply_from_client response object shows prompt tokens as 1, but the messages object being passed seems correct.

@ekzhu
Copy link
Collaborator

ekzhu commented Apr 25, 2024

Have you checked the logs on your local model server?

@ekzhu ekzhu added the models Pertains to using alternate, non-GPT, models (e.g., local models, llama, etc.) label Apr 25, 2024
@MarianoMolina
Copy link
Contributor Author

[2024-04-24 22:54:11.733] [INFO] Received POST request to /v1/chat/completions with body: {
"messages": [
{
"content": "[SYSTEM MESSAGE].",
"role": "system"
},
{
"content": "[PROMPT]",
"role": "user",
"name": "user_proxy_auto"
},
{
"content": "[RESPONSE 1]",
"role": "user",
"name": "hr_expert_drafter"
},
{
"role": "system",
"content": "[PROMPT 2]"
}
],
"model": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf",
"stream": false
}
[2024-04-24 22:54:11.733] [INFO] [LM STUDIO SERVER] Context Overflow Policy is: Truncate Middle
[2024-04-24 22:54:11.735] [INFO] [LM STUDIO SERVER] Last message: { role: 'system', content: '[PROMPT 2]' } (total messages = 4)
[2024-04-24 22:54:12.500] [INFO] [LM STUDIO SERVER] [TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf] Generated prediction: {
"id": "chatcmpl-t6yiyywzta91zv0ozsr9r",
"object": "chat.completion",
"created": 1714010051,
"model": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": ""
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 1,
"completion_tokens": 0,
"total_tokens": 1
}
}

@MarianoMolina
Copy link
Contributor Author

Ok, I figured it out.
Seems Mistral Instruct returns an empty completion when there's a second "system" message in the middle of the flow. Seems to be an issue with compatibility, which is something I should be aware of when using local models with Autogen, as stated in many places.
After repeating the process with this change, the response now seems correct.
With this in mind, it might be a good idea to allow the role of the reflection_with_llm prompt to be defined, since right now its hardcoded here:

    def _reflection_with_llm(
        self, prompt, messages, llm_agent: Optional[Agent] = None, cache: Optional[AbstractCache] = None
    ) -> str:
        """Get a chat summary using reflection with an llm client based on the conversation history.

        Args:
            prompt (str): The prompt (in this method it is used as system prompt) used to get the summary.
            messages (list): The messages generated as part of a chat conversation.
            llm_agent: the agent with an llm client.
            cache (AbstractCache or None): the cache client to be used for this conversation.
        """
        system_msg = [
            {
                "role": "system", -> hardcoded
                "content": prompt, -> Programatically defined
            }
        ]

        messages = messages + system_msg
        if llm_agent and llm_agent.client is not None:
            llm_client = llm_agent.client
        elif self.client is not None:
            llm_client = self.client
        else:
            raise ValueError("No OpenAIWrapper client is found.")
        response = self._generate_oai_reply_from_client(llm_client=llm_client, messages=messages, cache=cache)
        print("response: ", response)
        return response

It could be as simple as allowing the prompt argument to be an Union[str, dict] that gets deployed accordingly in _reflection_with_llm. Wouldn't affect behavior of current implementations, and allow this minor issue to be solved.

I can do a PR with this if that's ok?

@ekzhu
Copy link
Collaborator

ekzhu commented Apr 25, 2024

@MarianoMolina perfect. Yes a PR on this option sounds great! It can be added to summary_args parameter.

@MarianoMolina
Copy link
Contributor Author

@ekzhu #2527
Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models Pertains to using alternate, non-GPT, models (e.g., local models, llama, etc.)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants