Re-query speaker name when multiple speaker names returned during Group Chat speaker selection #2304

marklysze · 2024-04-06T05:14:41Z

Note: See UPDATED approach in the comment below.

Why are these changes needed?

During the speaker selection process (when in "auto" mode), the LLM returns the name of the next speaker. This is fairly reliable with OpenAI's models but with open-source/weight models they can sometimes have trouble returning, simply, the name of the next speaker. Often, they will return a sentence, paragraph, or even a large sequence of text. Furthermore, I have found that to get the correct next speaker name you often have to prompt these LLMs to provide an explanation in order to have a chain-of-thought that leads the LLM to the correct agent and the resulting response can often include the other agent names as part of the reasoning.

Currently, if there is more than one valid agent name it fails the speaker selection process by returning:
GroupChat select_speaker failed to resolve the next speaker's name. This is because the speaker selection OAI call returned: ...

This PR aims to provide a second-chance by prompting the LLM again, this time with a specific prompt text together with the returned text, asking the LLM to provide just the one agent name based on some rules. In my testing, I have found that this simpler step helps to overcome what would be a failing step and, depending on the model, this can drastically reduce the occurrence of the multiple agent names failure.

How does it work?

GroupChat has a new attribute, requery_on_multiple_speaker_names (bool) that enables a user to turn this feature on, default is off (False)
During the agent name matching step in speaker selection, GroupChat._finalize_speaker:
- If a single agent name is detected in the response, it returns the agent (unchanged)
- If multiple agent names are detected in the response and GroupChat.requery_on_multiple_speaker_names is True it will:
  - Compile a new chat message with a hard-coded prompt, including the response's text as context
  - Sends the message to the LLM (same LLM as the one used for the speaker selection)
  - If the new response has just one agent name, it is returned as the next agent (success!)
  - If the new response has no agent name or more than one, it fails with a similar message to the current error message
- If no agent names are detected, it fails with the current error message (unchanged)

The prompt to select the name

I have put together a prompt that performs reasonably well in selecting a speaker name. However, I foresee that this prompt will be tweaked over time and, possibly, have the option to be overridden by the user.

I tried zero-shot prompts and few-shot prompts and, oddly, the zero-shot prompt worked better.

The prompt I have is (where {name} is the response with multiple agents names):

Your role is to identify the current or next speaker based on the provided context.
The valid speaker names are {[agent.name for agent in agents]}.
To determine the speaker use these prioritised rules:
1. If the context refers to themselves as a speaker e.g. "As the..." , choose that speaker's name
2. If it refers to the "next" speaker name, choose that name
3. Otherwise, choose the first provided speaker's name in the context

Respond with just the name of the speaker and do not provide a reason.
Context:
{name}

Testing

I tried the following multiple agent-name response texts, here is the result of my testing.

Tests are based on these 7 multiple agent-name responses (which I have seen through my testing of models):

#	Response	Expected Agent Name
1.	`Product_Manager because they speak after the Chief_Marketing_Officer.`	`Product_Manager`
2.	`Thanks Chief_Marketing_Officer, as the Product_Manager my plan is to produce some amazing product ideas.`	`Product_Manager`
3.	`Product_Manager. Here are five ideas that I think will impress the Chief_Marketing_Officer and be great for a marketing strategy for the Digital_Marketer.`	`Product_Manager`
4.	`The next speaker, after the Chief_Marketing_Officer, is the Product_Manager.`	`Product_Manager`
5.	`As the Product_Manager I've decided that an infotainment system that links up with your phone will have the biggest impact on the marketplace. Digital_Marketer, over to you for an amazing strategy.`	`Product_Manager`
6.	`Thank you Digital_Marketer, the next speaker will be Chief_Marketing_Officer.`	`Chief_Marketing_Officer`
7.	`What a great team! Let's hear from the Chief_Marketing_Officer now that the Product_Manager has some ideas.`	`Chief_Marketing_Officer`

Results

LLM	Correct out of 7
Open source/weight
llama2:13b-chat	ALL 7
mistral:7b-instruct-v0.2-q6_K	ALL 7
mixtralq4	ALL 7
mixtralq5	ALL 7
orca2:13b-q5_K_S	ALL 7
codellama:34b-instruct	6 / 7
gemma:2b-instruct	6 / 7
gemma:7b-instruct	6 / 7
neural-chat:7b-v3.3-q6_K	6 / 7
openhermes:7b-mistral-v2.5-q6_K	6 / 7
qwen:14b-chat-q6_K	6 / 7
codellama:34b-python	5 / 7
llama2:7b-chat-q6_K	5 / 7
yi:34b-chat-q4_K_M	5 / 7
deepseek-coder:6.7b-instruct-q6_K	4 / 7
solar:10.7b-instruct-v1-q5_K_M	4 / 7
phi	3 / 7
phind-codellama:34b-v2	3 / 7
codellama:7b-python	0 / 7
codellama:13b-python	0 / 7
nexusraven	0 / 7

OpenAI
gpt-3.5-turbo	6 of 7
gpt-4	6 of 7
gpt-4-turbo-preview	6 of 7

Mistral.AI
small	6 of 7
medium	6 of 7
large	ALL 7

I believe this capability will noticeably improve the reliability of GroupChat speaker selection

What is missing from this PR

Tests - As this PR is based on LLM responses, I need some guidance on what tests (if any) to create. Additionally, as this is focused more towards alt-models, can that be tested in any way. Finally, we would need to make sure that the responses are consistent.

Documentation - Along with the broader need for GroupChat documents (see #2243), I think this could be added to that PR as well as tips for Non-OpenAI models.

Thanks!

Related issue number

Based on shortcomings identified in #1746.

Checks

I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

…alize_speaker to requery on multiple speaker names (if enabled)

…ery_speaker_name_on_multiple

…on_multiple

codecov-commenter · 2024-04-06T05:15:41Z

Codecov Report

Attention: Patch coverage is 25.00000% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 50.01%. Comparing base (4a44093) to head (c464f45).

Files	Patch %	Lines
autogen/agentchat/groupchat.py	25.00%	9 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #2304       +/-   ##
===========================================
+ Coverage   38.14%   50.01%   +11.86%     
===========================================
  Files          78       78               
  Lines        7865     7874        +9     
  Branches     1683     1824      +141     
===========================================
+ Hits         3000     3938      +938     
+ Misses       4615     3605     -1010     
- Partials      250      331       +81

Flag	Coverage Δ
unittest	`14.21% <16.66%> (?)`
unittests	`48.98% <25.00%> (+10.85%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ekzhu · 2024-04-06T07:04:36Z

Thanks for the analysis! It looks like requerying aka give LLM a second chance makes the selection more robust.

I am wondering instead of further parametrizing the "auto" method, can we add another speaker_selection_method, such as "auto_with_retry", which retries the selection until a single speaker is returned. Do you think this approach goes a step further to address the robustness issue?

Effectively, this will be a new built in speaker selection method. You can see how we can currently use user defined selection method like this: https://microsoft.github.io/autogen/docs/topics/groupchat/customized_speaker_selection

joshkyh

Thanks for the PR! Love the analysis.

autogen/agentchat/groupchat.py

marklysze · 2024-04-08T20:26:54Z

Thanks for the analysis! It looks like requerying aka give LLM a second chance makes the selection more robust.

I am wondering instead of further parametrizing the "auto" method, can we add another speaker_selection_method, such as "auto_with_retry", which retries the selection until a single speaker is returned. Do you think this approach goes a step further to address the robustness issue?

Effectively, this will be a new built in speaker selection method. You can see how we can currently use user defined selection method like this: https://microsoft.github.io/autogen/docs/topics/groupchat/customized_speaker_selection

Thanks for highlighting the possible approach, @ekzhu - I didn't think about it as a new method but it's definitely worth considering.

Regarding the retries until a single speaker is returned:

During my testing, I found that if it didn't succeed with the first re-query it was because it returned either:

text that wouldn't be useful for re-querying (it went off on a tangent)
blanks

It was rare for it to return text that still had agent names or useful context to then feed back for a re-query.

It is possible that we could introduce a second re-query prompt that is different to the first one (possibly simpler like "Select the most prominent speaker name from this list {[agent.name for agent in agents]}, Context: {name}". And, if this failed to identify a single agent, we could do the next suggestion (below) or return the failed response.

Alternatively, we could just take the first mentioned name from the original response (which seems to be, more often than not, the correct one) rather than throwing an error.

New method, auto_with_retry

In terms of adding a new method, I wasn't sure how much "auto" was already used for logic within the code and was wondering if adding a new, but similar, method would result in having to replicate changes for auto also for auto_with_retry whenever they came through.

In groupchat.py it doesn't look like it would be too much work to accommodate both auto and auto_with_retry. I'm happy to add as a method which would make it a more obvious approach for users.

Let me know if we do want to make this change and I'll update code.

marklysze · 2024-04-30T00:25:17Z

Okay, so the manual selection process is breaking when the user does not select an agent (response is blank or "q" in manual_select_speaker).

So that needs to be handled. As the on-screen direction is "enter nothing or 'q' to use auto selection", I will have it select the next agent if they don't select a valid one during the manual selection. I've committed a fix.

@ekzhu

…up Chat speaker selection (microsoft#2304) * Added requery_on_multiple_speaker_names to GroupChat and updated _finalize_speaker to requery on multiple speaker names (if enabled) * Removed unnecessary comments * Update to current main * Tweak error message. * Comment clarity * Expanded description of Group Chat requery_on_multiple_speaker_names * Reworked to two-way nested chat for speaker selection with default of 2 retries. * Adding validation of new GroupChat attributes * Updates as per @ekzhu's suggestions * Update groupchat - Added select_speaker_auto_multiple_template and select_speaker_auto_none_template - Added max_attempts comment - Re-instated support for role_for_select_speaker_messages - * Update conversable_agent.py Added ability to force override role for a message to support select speaker prompt. * Update test_groupchat.py Updated existing select_speaker test functions as underlying approach has changed, added necessary tests for new functionality. * Removed block for manual selection in select_speaker function. * Catered for no-selection during manual selection mode --------- Co-authored-by: Chi Wang <wang.chi@microsoft.com>

marklysze added 8 commits April 3, 2024 10:08

Added requery_on_multiple_speaker_names to GroupChat and updated _fin…

c853cf2

…alize_speaker to requery on multiple speaker names (if enabled)

Removed unnecessary comments

9357265

Merge branch 'main' of https://github.com/microsoft/autogen into requ…

be3862a

…ery_speaker_name_on_multiple

Update to current main

3b453e0

Tweak error message.

723ff35

Comment clarity

86de80a

Merge remote-tracking branch 'origin/main' into requery_speaker_name_…

6c57982

…on_multiple

Expanded description of Group Chat requery_on_multiple_speaker_names

c464f45

marklysze had a problem deploying to openai1 April 6, 2024 05:14 — with GitHub Actions Failure

marklysze added group chat group-chat-related issues alt-models Pertains to using alternate, non-GPT, models (e.g., local models, llama, etc.) labels Apr 6, 2024

marklysze requested a review from ekzhu April 6, 2024 05:16

marklysze self-assigned this Apr 6, 2024

sonichi requested review from joshkyh and yiranwu0 April 8, 2024 08:18

joshkyh reviewed Apr 8, 2024

View reviewed changes

autogen/agentchat/groupchat.py Outdated Show resolved Hide resolved

marklysze had a problem deploying to openai1 April 30, 2024 00:25 — with GitHub Actions Failure

Merge branch 'main' into requery_speaker_name_on_multiple

9d498ad

marklysze temporarily deployed to openai1 April 30, 2024 00:32 — with GitHub Actions Inactive

marklysze had a problem deploying to openai1 April 30, 2024 00:32 — with GitHub Actions Failure

marklysze temporarily deployed to openai1 April 30, 2024 00:32 — with GitHub Actions Inactive

marklysze had a problem deploying to openai1 April 30, 2024 00:32 — with GitHub Actions Failure

marklysze temporarily deployed to openai1 April 30, 2024 00:32 — with GitHub Actions Inactive

marklysze had a problem deploying to openai1 April 30, 2024 00:32 — with GitHub Actions Failure

marklysze temporarily deployed to openai1 April 30, 2024 00:32 — with GitHub Actions Inactive

marklysze had a problem deploying to openai1 April 30, 2024 00:32 — with GitHub Actions Failure

sonichi enabled auto-merge April 30, 2024 01:09

sonichi requested a review from ekzhu April 30, 2024 01:38

ekzhu approved these changes Apr 30, 2024

View reviewed changes

sonichi added this pull request to the merge queue Apr 30, 2024

Merged via the queue into main with commit 5b6ae32 Apr 30, 2024
75 of 85 checks passed

sonichi deleted the requery_speaker_name_on_multiple branch April 30, 2024 04:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-query speaker name when multiple speaker names returned during Group Chat speaker selection #2304

Re-query speaker name when multiple speaker names returned during Group Chat speaker selection #2304

marklysze commented Apr 6, 2024 •

edited

Loading

codecov-commenter commented Apr 6, 2024 •

edited

Loading

ekzhu commented Apr 6, 2024 •

edited

Loading

joshkyh left a comment

marklysze commented Apr 8, 2024

marklysze commented Apr 30, 2024

Re-query speaker name when multiple speaker names returned during Group Chat speaker selection #2304

Re-query speaker name when multiple speaker names returned during Group Chat speaker selection #2304

Conversation

marklysze commented Apr 6, 2024 • edited Loading

Why are these changes needed?

The prompt to select the name

Testing

What is missing from this PR

Thanks!

Related issue number

Checks

codecov-commenter commented Apr 6, 2024 • edited Loading

Codecov Report

ekzhu commented Apr 6, 2024 • edited Loading

joshkyh left a comment

Choose a reason for hiding this comment

marklysze commented Apr 8, 2024

Regarding the retries until a single speaker is returned:

New method, auto_with_retry

marklysze commented Apr 30, 2024

marklysze commented Apr 6, 2024 •

edited

Loading

codecov-commenter commented Apr 6, 2024 •

edited

Loading

ekzhu commented Apr 6, 2024 •

edited

Loading