add math-class group chat test #309

LittleLittleCloud · 2023-10-20T00:08:16Z

Why are these changes needed?

this test is imported from #102

After #294 it's possible to test the group chat result in a more robust way.

By using function_call and adding a special prefix to messages, we can examine the conversation flow inside a group chat.

This test also provides an example for #133, which terminates group chat/two agent chat in a more controlable way

Related issue number

Checks

I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

codecov-commenter · 2023-10-20T00:18:11Z

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (d542340) 28.70% compared to head (c8032f2) 37.00%.

Files	Patch %	Lines
autogen/agentchat/groupchat.py	14.28%	5 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #309      +/-   ##
==========================================
+ Coverage   28.70%   37.00%   +8.29%     
==========================================
  Files          27       27              
  Lines        3383     3389       +6     
  Branches      760      762       +2     
==========================================
+ Hits          971     1254     +283     
+ Misses       2341     2018     -323     
- Partials       71      117      +46

Flag	Coverage Δ
unittests	`36.94% <14.28%> (+8.29%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sonichi

Looks good structure wise.

sonichi · 2023-10-20T04:02:08Z

test/agentchat/test_groupchat.py

+    - speaker selection should work under a continuous q&a scenario among two agents and GPT 3.5 model.
+    - admin should end the class when teacher has created 3 questions.
+    """
+    skip_if_openai_not_available()


Nice. This is better than my old way. @rickyloynd-microsoft @thinkall @kevin666aa FYI.

sonichi · 2023-10-20T04:05:34Z

If you know which user can benefit from this PR, could you ask them to review it?

LittleLittleCloud · 2023-10-20T17:39:26Z

@sonichi. It's @afourney. I saw you just sent him a request

LittleLittleCloud · 2023-10-20T17:41:38Z

test/agentchat/test_groupchat.py

+        ],
+    }
+
+    def terminate_group_chat(message):


@afourney This is how to achieve a more robust terminating strategy via function_call, could you review it?

@LittleLittleCloud I am having trouble understand why this strategy is more robust?

Suggested strategy: one of the agents calls a group chat termination function. user proxy executes. manager detects termination string

Old strategy: one of the agent generates a termination string. manager detects.

If an agent is smart enough for suggested strategy then it should be able to also do old strategy? Sorry if I missed the argument for the increased robustness.

If an agent is smart enough for suggested strategy then it should be able to also do old strategy?

Yes! Unfortunately, that's not the case in the real world, especially for gpt-3.5-turbo. For some quite obvious reason I still use gpt-3.5-turbo most frequently and gpt-3.5-turbo agents are not very robust in generating correct termination messages.

Even for gpt-4 agents, they might also fail to give the correct termination world when the conversation grows.

The new strategy (using termination_function_call) can make sure the group chat terminate correctly when that termination function_call get triggered. That strategy also works well on gpt-3.5-turbo which fined-tuned for function_call.

Got it -- the fact that function calls are prioritized makes it more robust. I added this comment to #525

gagb · 2023-11-03T21:31:34Z

Once we review and approve this PR I think we can make progress on #525 and #517. (cc @victordibia @pcdeadeasy)

gagb · 2023-11-03T23:51:06Z

autogen/agentchat/groupchat.py

+
+            if (
+                type(reply) == dict
+                and self._is_termination_msg(reply)
+                or type(reply) == str
+                and self._is_termination_msg({"content": reply})
+            ):
+                break
+


@LittleLittleCloud, should these lines be after speaker.send below??

also, I usually use isinstance() instead of type()

@LittleLittleCloud, should these lines be after speaker.send below??

Did this suggestion make sense?

…oud/autogen into u/xiaoyun/addTest

skzhang1 · 2023-10-21T16:54:13Z

test/agentchat/test_groupchat.py

+        system_message="""You are a pre-school math teacher, you create 3 math questions for student to resolve.
+        Here's your workflow:
+        -workflow-
+        if question count > 3 say [COMPLETE].


Perhaps it would be better to make the if-else grammar consistent in your code.

qingyun-wu · 2023-12-04T01:43:13Z

Hi @LittleLittleCloud, do you have a plan to further update this PR? There are some conflicts in two files due to some recent updates in group chat.

ekzhu · 2023-12-25T20:32:00Z

@LittleLittleCloud bump so this PR doesn't get abandoned :D

add math class chat

c3944fd

LittleLittleCloud had a problem deploying to openai October 20, 2023 00:08 — with GitHub Actions Failure

fix commit

39c9652

LittleLittleCloud had a problem deploying to openai October 20, 2023 00:10 — with GitHub Actions Failure

fix test

9a8ee74

LittleLittleCloud had a problem deploying to openai October 20, 2023 00:21 — with GitHub Actions Failure

LittleLittleCloud marked this pull request as ready for review October 20, 2023 00:55

sonichi reviewed Oct 20, 2023

View reviewed changes

Merge branch 'main' into u/xiaoyun/addTest

4439a1e

sonichi had a problem deploying to openai October 20, 2023 04:03 — with GitHub Actions Failure

sonichi had a problem deploying to openai October 20, 2023 04:03 — with GitHub Actions Error

sonichi temporarily deployed to openai October 20, 2023 04:03 — with GitHub Actions Inactive

sonichi requested a review from afourney October 20, 2023 04:04

Merge branch 'main' into u/xiaoyun/addTest

a1993bf

LittleLittleCloud had a problem deploying to openai October 20, 2023 17:39 — with GitHub Actions Failure

LittleLittleCloud commented Oct 20, 2023

View reviewed changes

sonichi requested review from qingyun-wu and skzhang1 October 21, 2023 00:29

qingyun-wu requested a review from yiranwu0 October 21, 2023 19:09

gagb approved these changes Nov 3, 2023

View reviewed changes

gagb reviewed Nov 3, 2023

View reviewed changes

LittleLittleCloud added 2 commits November 13, 2023 23:38

use isinstance

02c52a3

Merge branch 'u/xiaoyun/addTest' of https://github.com/LittleLittleCl…

d517e22

…oud/autogen into u/xiaoyun/addTest

LittleLittleCloud had a problem deploying to openai1 November 14, 2023 07:38 — with GitHub Actions Failure

LittleLittleCloud added 2 commits November 13, 2023 23:40

Merge branch 'main' into u/xiaoyun/addTest

e12d5e4

update

f262a88

LittleLittleCloud had a problem deploying to openai1 November 14, 2023 08:56 — with GitHub Actions Failure

format

c8032f2

LittleLittleCloud had a problem deploying to openai1 November 14, 2023 08:57 — with GitHub Actions Failure

skzhang1 reviewed Nov 14, 2023

View reviewed changes

gagb closed this Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add math-class group chat test #309

add math-class group chat test #309

LittleLittleCloud commented Oct 20, 2023 •

edited

Loading

codecov-commenter commented Oct 20, 2023 •

edited

Loading

sonichi left a comment

sonichi Oct 20, 2023

sonichi commented Oct 20, 2023

LittleLittleCloud commented Oct 20, 2023

LittleLittleCloud Oct 20, 2023

gagb Nov 3, 2023 •

edited

Loading

LittleLittleCloud Nov 3, 2023 •

edited

Loading

gagb Nov 3, 2023 •

edited

Loading

gagb commented Nov 3, 2023 •

edited

Loading

gagb Nov 3, 2023 •

edited

Loading

sonichi Nov 4, 2023

gagb Nov 8, 2023

LittleLittleCloud Nov 14, 2023

skzhang1 Oct 21, 2023

qingyun-wu commented Dec 4, 2023

ekzhu commented Dec 25, 2023

add math-class group chat test #309

add math-class group chat test #309

Conversation

LittleLittleCloud commented Oct 20, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

codecov-commenter commented Oct 20, 2023 • edited Loading

Codecov Report

sonichi left a comment

Choose a reason for hiding this comment

sonichi Oct 20, 2023

Choose a reason for hiding this comment

sonichi commented Oct 20, 2023

LittleLittleCloud commented Oct 20, 2023

LittleLittleCloud Oct 20, 2023

Choose a reason for hiding this comment

gagb Nov 3, 2023 • edited Loading

Choose a reason for hiding this comment

LittleLittleCloud Nov 3, 2023 • edited Loading

Choose a reason for hiding this comment

gagb Nov 3, 2023 • edited Loading

Choose a reason for hiding this comment

gagb commented Nov 3, 2023 • edited Loading

gagb Nov 3, 2023 • edited Loading

Choose a reason for hiding this comment

sonichi Nov 4, 2023

Choose a reason for hiding this comment

gagb Nov 8, 2023

Choose a reason for hiding this comment

LittleLittleCloud Nov 14, 2023

Choose a reason for hiding this comment

skzhang1 Oct 21, 2023

Choose a reason for hiding this comment

qingyun-wu commented Dec 4, 2023

ekzhu commented Dec 25, 2023

LittleLittleCloud commented Oct 20, 2023 •

edited

Loading

codecov-commenter commented Oct 20, 2023 •

edited

Loading

gagb Nov 3, 2023 •

edited

Loading

LittleLittleCloud Nov 3, 2023 •

edited

Loading

gagb Nov 3, 2023 •

edited

Loading

gagb commented Nov 3, 2023 •

edited

Loading

gagb Nov 3, 2023 •

edited

Loading