[Bug]: Does the Browsing Agent need run_ipython action? #4355

enyst · 2024-10-13T01:26:52Z

Is there an existing issue for the same bug?

I have checked the existing issues.

Describe the bug and reproduction steps

Running Browsing Agent with Deepseek, I got a syntax error, and it turns out that what the LLM was trying to do is ... not necessarily "wrong", but we're not ready for it. Maybe we can address this by teaching our browsing agent the run ipython action? Or can we just send it like this to browsergym?

LLM response:

To summarize the quality and cost information of various language models from the blog article, I need to extract the relevant details and add them to the file. I'll start by summarizing the key points from the article.

send_msg_to_user('Summarizing the quality and cost information of various language models from the blog article...')

# Summarize the key points
summary = """
Key takeaways from the evaluation of LLMs as coding agents on SWE-Bench:

1. Clauder 3.5 Sonnet is the best by a fair amount, achieving a 27% resolve rate with the default agent in OpenHands.
2. GPT-4o lags behind, and o1-mini performed somewhat worse than GPT-4o. Analysis suggests o1 was sometimes "overthinking" tasks, performing extra environment configuration.
3. The strongest open models were Llama 3.1 405B and deepseek-v2.5, which performed reasonably well, even besting some closed models.
4. The price/accuracy tradeoff is an important consideration. A chart of this tradeoff is provided in the article.
"""

# Add the summary to the file
send_msg_to_user(summary)```

The response did include "```python" too.

Result with our current implementation, which doesn't expect variables in send_msg_to_user:

Traceback (most recent call last):
....
  File "/Users/enyst/repos/odie/openhands/agenthub/browsing_agent/response_parser.py", line 42, in parse_action
    return self.default_parser.parse(action_str)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/enyst/repos/odie/openhands/agenthub/browsing_agent/response_parser.py", line 108, in parse
    msg_content = args[0].value
                  ^^^^^^^^^^^^^
AttributeError: 'Name' object has no attribute 'value'

OpenHands Installation

Development workflow

OpenHands Version

No response

Operating System

MacOS

Logs, Errors, Screenshots, and Additional Context

No response

The text was updated successfully, but these errors were encountered:

enyst · 2024-10-13T02:14:15Z

Curiously, the LLM seems to be actually right: it has been told in the prompt that its action space includes:

send_msg_to_user(text: str)
    Examples:
        send_msg_to_user('Based on the results of my search, the city was built in 1751.')

and then:

Multiple actions can be provided at once, but will be executed sequentially without any feedback from the page.
More than 2-3 actions usually leads to failure or unexpected behavior. Example:
fill('a12', 'example with "quotes"')
click('a51')
click('48', button='middle', modifiers=['Shift'])

So... it seems fair that the LLM may decide to send 2x send_msg_to_the_user? 😅

enyst · 2024-10-13T15:15:37Z

Example of an interesting response, just for thought:

To summarize the quality and cost information of various language models, I need to extract the relevant details from the text and the chart. I will start by extracting the text information and then focus on the chart.

# Extract text information
text_info = [
    "Clauder 3.5 Sonnet is the best by a fair amount, achieving a 27% resolve rate with the default agent in OpenHands.",
    "GPT-4o lags behind, and o1-mini actually performed somewhat worse than GPT-4o. We went in and analyzed the results a little, and briefly it seemed like o1 was sometimes 'overthinking' things, performing extra environment configuration tasks when it could just go ahead and finish the task.",
    "Finally, the strongest open models were Llama 3.1 405 B and deepseek-v2.5, and they performed reasonably, even besting some of the closed models."
]

# Extract chart information
# Since the chart is an image, I will need to click on it to get more details
click('112'

Note:
The task was to summarize the quality and cost of LLMs. The agent wants both the text - which it summarized in text_info, and then, it wants, at next step, to see the chart. But the way the browsing agent currently works, it will lose text_info at the next step, because it never includes anything else from step to step, but the commands (e.g. only click('112')). Cc: @ketan1741

Thus in my test, this lead to a lot of time trying and losing and trying and losing, and ended in a stuck in a loop.

ketan1741 · 2024-10-13T17:06:46Z

But the way the browsing agent currently works, it will lose text_info at the next step, because it never includes anything else from step to step, but the commands (e.g. only click('112')).

Yes, that's exactly how it works right now. We should look into ways to improve it. We could include at least the previous one or two observations, thoughts+action, for the next step.

github-actions · 2024-11-13T01:58:26Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2024-11-20T02:01:41Z

This issue was closed because it has been stalled for over 30 days with no activity.

enyst added bug Something isn't working agent quality Related to specific agent quality labels Oct 13, 2024

enyst mentioned this issue Oct 13, 2024

Fix issue #4225: Add evaluation data to the LLMs docs #4312

Merged

github-actions bot added the Stale Inactive for 30 days label Nov 13, 2024

enyst mentioned this issue Nov 14, 2024

feat(llm): convert function call request for non-funcall OSS model #4711

Merged

1 task

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Does the Browsing Agent need run_ipython action? #4355

[Bug]: Does the Browsing Agent need run_ipython action? #4355

enyst commented Oct 13, 2024 •

edited

Loading

enyst commented Oct 13, 2024 •

edited

Loading

enyst commented Oct 13, 2024

ketan1741 commented Oct 13, 2024

github-actions bot commented Nov 13, 2024

github-actions bot commented Nov 20, 2024

[Bug]: Does the Browsing Agent need run_ipython action? #4355

[Bug]: Does the Browsing Agent need run_ipython action? #4355

Comments

enyst commented Oct 13, 2024 • edited Loading

Is there an existing issue for the same bug?

Describe the bug and reproduction steps

OpenHands Installation

OpenHands Version

Operating System

Logs, Errors, Screenshots, and Additional Context

enyst commented Oct 13, 2024 • edited Loading

enyst commented Oct 13, 2024

ketan1741 commented Oct 13, 2024

github-actions bot commented Nov 13, 2024

github-actions bot commented Nov 20, 2024

enyst commented Oct 13, 2024 •

edited

Loading

enyst commented Oct 13, 2024 •

edited

Loading