-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Does the Browsing Agent need run_ipython action? #4355
Comments
Curiously, the LLM seems to be actually right: it has been told in the prompt that its action space includes:
and then:
So... it seems fair that the LLM may decide to send 2x |
Example of an interesting response, just for thought:
# Extract text information
text_info = [
"Clauder 3.5 Sonnet is the best by a fair amount, achieving a 27% resolve rate with the default agent in OpenHands.",
"GPT-4o lags behind, and o1-mini actually performed somewhat worse than GPT-4o. We went in and analyzed the results a little, and briefly it seemed like o1 was sometimes 'overthinking' things, performing extra environment configuration tasks when it could just go ahead and finish the task.",
"Finally, the strongest open models were Llama 3.1 405 B and deepseek-v2.5, and they performed reasonably, even besting some of the closed models."
]
# Extract chart information
# Since the chart is an image, I will need to click on it to get more details
click('112' Note: Thus in my test, this lead to a lot of time trying and losing and trying and losing, and ended in a stuck in a loop. |
Yes, that's exactly how it works right now. We should look into ways to improve it. We could include at least the previous one or two observations, thoughts+action, for the next step. |
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
This issue was closed because it has been stalled for over 30 days with no activity. |
Is there an existing issue for the same bug?
Describe the bug and reproduction steps
Running Browsing Agent with Deepseek, I got a syntax error, and it turns out that what the LLM was trying to do is ... not necessarily "wrong", but we're not ready for it. Maybe we can address this by teaching our browsing agent the run ipython action? Or can we just send it like this to browsergym?
LLM response:
The response did include "```python" too.
Result with our current implementation, which doesn't expect variables in
send_msg_to_user
:OpenHands Installation
Development workflow
OpenHands Version
No response
Operating System
MacOS
Logs, Errors, Screenshots, and Additional Context
No response
The text was updated successfully, but these errors were encountered: