Skip to content

Commit

Permalink
Enable web browsing ability in agentscope (#296)
Browse files Browse the repository at this point in the history
---------

Co-authored-by: wenhao <wenhao@U-J0P61NH2-2025.local>
Co-authored-by: DavdGao <gaodawei.gdw@alibaba-inc.com>
  • Loading branch information
3 people authored Sep 6, 2024
1 parent 07165b3 commit 436b9de
Show file tree
Hide file tree
Showing 21 changed files with 1,431 additions and 146 deletions.
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ Start building LLM-empowered multi-agent applications in an easier way.

## News

- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-09-03]** AgentScope supports **Web Browser Control** now! Refer to our [example](https://github.com/modelscope/agentscope/tree/main/examples/conversation_with_web_browser_agent) for more details.

<h5 align="left">
<video src="https://github.com/user-attachments/assets/6d03caab-6193-4ac6-8b1c-36f152ec02ec" width="45%" alt="web browser control" controls></video>
</h5>

- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-07-18]** AgentScope supports streaming mode now! Refer to our [tutorial](https://modelscope.github.io/agentscope/en/tutorial/203-stream.html) and example [conversation in stream mode](https://github.com/modelscope/agentscope/tree/main/examples/conversation_in_stream_mode) for more details.

<h5 align="left">
Expand All @@ -55,6 +61,9 @@ Start building LLM-empowered multi-agent applications in an easier way.

- **[2024-06-09]** We release **AgentScope** v0.0.5 now! In this new version, [**AgentScope Workstation**](https://modelscope.github.io/agentscope/en/tutorial/209-gui.html) (the online version is running on [agentscope.io](https://agentscope.io)) is open-sourced with the refactored [**AgentScope Studio**](https://modelscope.github.io/agentscope/en/tutorial/209-gui.html)!

<details>
<summary>Full News</summary>

- **[2024-05-24]** We are pleased to announce that features related to the **AgentScope Workstation** will soon be open-sourced! The online website services are temporarily offline. The online website service will be upgraded and back online shortly. Stay tuned...

- **[2024-05-15]** A new **Parser Module** for **formatted response** is added in AgentScope! Refer to our [tutorial](https://modelscope.github.io/agentscope/en/tutorial/203-parser.html) for more details. The [`DictDialogAgent`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/agents/dict_dialog_agent.py) and [werewolf game](https://github.com/modelscope/agentscope/tree/main/examples/game_werewolf) example are updated simultaneously.
Expand Down Expand Up @@ -87,6 +96,8 @@ available in [PyPI](https://pypi.org/project/agentscope/)!
- **[2024-02-14]** We release our paper "AgentScope: A Flexible yet Robust
Multi-Agent Platform" in [arXiv](https://arxiv.org/abs/2402.14034) now!

</details>

---

## What's AgentScope?
Expand Down Expand Up @@ -151,6 +162,7 @@ the following libraries.
- Multi Modality
- Wikipedia Search and Retrieval
- TripAdvisor Search
- Web Browser Control

**Example Applications**

Expand Down
13 changes: 13 additions & 0 deletions README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,20 @@

## 新闻

- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-09-03]** AgentScope 已更新浏览器控制模块,利用 vision 模型实现智能体对浏览器的控制。请参考[**样例**](https://github.com/modelscope/agentscope/tree/main/examples/conversation_with_web_browser_agent)

<h5 align="left">
<video src="https://github.com/user-attachments/assets/6d03caab-6193-4ac6-8b1c-36f152ec02ec" width="45%" alt="web browser control" controls></video>
</h5>

- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-07-18]** AgentScope 已支持模型流式输出。请参考我们的 [**教程**](https://modelscope.github.io/agentscope/zh_CN/tutorial/203-stream.html)[**流式对话样例**](https://github.com/modelscope/agentscope/tree/main/examples/conversation_in_stream_mode)

<h5 align="left">
<img src="https://github.com/user-attachments/assets/b14d9b2f-ce02-4f40-8c1a-950f4022c0cc" width="45%" alt="agentscope-logo">
<img src="https://github.com/user-attachments/assets/dfffbd1e-1fe7-49ee-ac11-902415b2b0d6" width="45%" alt="agentscope-logo">
</h5>


- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-07-15]** AgentScope 中添加了 Mixture of Agents 算法。使用样例请参考 [MoA 示例](https://github.com/modelscope/agentscope/blob/main/examples/conversation_mixture_of_agents)

- **[2024-06-14]** 新的提示调优(Prompt tuning)模块已经上线 AgentScope,用以帮助开发者生成和优化智能体的 system prompt。更多的细节和使用样例请参考 AgentScope [教程](https://modelscope.github.io/agentscope/en/tutorial/209-prompt_opt.html)
Expand All @@ -56,6 +63,9 @@

- **[2024-06-09]** AgentScope v0.0.5 已经更新!在这个新版本中,我们开源了 [**AgentScope Workstation**](https://modelscope.github.io/agentscope/en/tutorial/209-gui.html) (在线版本的网址是[agentscope.io](https://agentscope.io))!

<details>
<summary>完整新闻</summary>

- **[2024-05-24]** 我们很高兴地宣布 **AgentScope Workstation** 相关功能即将开源。我们的网站服务暂时下线。在线服务会很快升级重新上线,敬请期待...

- **[2024-05-15]** 用于解析模型格式化输出的**解析器**模块已经上线 AgentScope!更轻松的构建多智能体应用,使用方法请参考[教程](https://modelscope.github.io/agentscope/en/tutorial/203-parser.html)。与此同时,[`DictDialogAgent`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/agents/dict_dialog_agent.py) 类和 [狼人杀游戏](https://github.com/modelscope/agentscope/tree/main/examples/game_werewolf) 样例也已经同步更新!
Expand Down Expand Up @@ -84,6 +94,8 @@

- **[2024-02-14]** 我们在arXiv上发布了论文“[AgentScope: A Flexible yet Robust Multi-Agent Platform](https://arxiv.org/abs/2402.14034)”!

</details>

---

## 什么是AgentScope?
Expand Down Expand Up @@ -141,6 +153,7 @@ AgentScope支持使用以下库快速部署本地模型服务。
- 多模态生成
- 维基百科搜索
- TripAdvisor搜索
- 浏览器控制

**样例应用**

Expand Down
1 change: 1 addition & 0 deletions docs/sphinx_doc/en/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ AgentScope Documentation
tutorial/208-distribute.md
tutorial/209-gui.md
tutorial/210-rag.md
tutorial/211-web.md
tutorial/105-logging.md
tutorial/207-monitor.md
tutorial/104-usecase.md
Expand Down
90 changes: 90 additions & 0 deletions docs/sphinx_doc/en/source/tutorial/211-web.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
(211-web-en)=

# Web Browser Control

AgentScope supports web browser control with the `agentscope.service.WebBrowser` module.
It allows agent to interact with web pages, and take actions like clicking, typing and scrolling.

> Note the current web browser module requires a vision LLM to work properly. We will provide text-based vision in the future.
> Note the web browser module is still in beta, which will be updated frequently.

## Prerequisites

The `WebBrowser` module is implemented based on [Playwright](https://playwright.dev/).
You need to install the lasted AgentScope, as well as the playwright packages as follows:

```bash
# Install the latest AgentScope from source
git clone https://github.com/modelscope/agentscope.git
cd agentscope
pip install -e .

# Install playwright
pip install playwright
playwright install
```

## Guidance

Initialize the `WebBrowser` module as follows

```python
from agentscope.service import WebBrowser

browser = WebBrowser()
```

The `WebBrowser` module facilitates browser control and state retrieval.
The name of the control functions are all prefixed by "action_", e.g. `action_visit_url`,
and `action_click`. To see the full list of functions, calling the `get_action_functions` method.

```python
# To see full supported actions
print(browser.get_action_functions())

# Visit a new webpage
browser.action_visit_url("https://www.bing.com")
```

To monitor the current state of the browser, you can call the function prefixed by `"page_"`, e.g. `page_url`, `page_title`, and `page_html`".

```python
# The url
print(browser.page_url)

# The page title
print(browser.page_title)

# The page in MarkDown format (parsed by markdownify)
print(browser.page_markdown)

# The page html (maybe too long)
print(browser.page_html)
```

Besides, to help vision models to understand the webpage better, we provide `set_interactive_marks` function,
which will mark all the interactive elements on the current webpage with index labels.
After calling `set_interactive_marks` function, more actions can be performed on the webpage.
For example, clicking a button, typing in a text box, etc.

```python
# Set interactive marks with index labels
browser.set_interactive_marks()

# Remove interactive marks
# browser.remove_interactive_marks()
```

## Work with Agent

The above functions provide basic operations for interactive web browser control.
You can use them to build your own web browsing agent.

In AgentScope, the web browser is also some kind of tool functions, so you can use it together with the service toolkit module to build your own agent.
We also provide a [web browser agent](https://github.com/modelscope/agentscope/tree/main/examples/conversation_with_web_browser_agent)) in our example.
You can refer to it for more details.


[[Back to the top]](#211-web-en)
1 change: 1 addition & 0 deletions docs/sphinx_doc/zh_CN/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ AgentScope 文档
tutorial/208-distribute.md
tutorial/209-gui.md
tutorial/210-rag.md
tutorial/211-web.md
tutorial/105-logging.md
tutorial/207-monitor.md
tutorial/104-usecase.md
Expand Down
92 changes: 92 additions & 0 deletions docs/sphinx_doc/zh_CN/source/tutorial/211-web.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
(211-web-cn)=

AgentScope 支持使用 `agentscope.service.WebBrowser` 模块进行 Web 浏览器控制。
它允许代理与网页进行交互,并执行点击、输入和滚动等网页操作。

> 注意当前的 Web 浏览器模块仍处于测试阶段,在未来的一段时间内将会频繁更新和优化。
## 预备

`WebBrowser` 模块基于 [Playwright](https://playwright.dev/) 实现,需要安装最新版本的 AgentScope 和 playwright 环境:

```bash
# 从源码安装最新版本的 AgentScope
git clone https://github.com/modelscope/agentscope.git
cd agentscope
pip install -e .

# 安装 playwright
pip install playwright
playwright install
```

## Guidance

通过以下方式初始化一个 `WebBrowser` 模块实例:

```python
from agentscope.service import WebBrowser

browser = WebBrowser()
```

The `WebBrowser` module facilitates browser control and state retrieval.
The name of the control functions are all prefixed by "action_", e.g. `action_visit_url`,
and `action_click`. To see the full list of functions, calling the `get_action_functions` method.

`WebBrowser` 模块提供了浏览器控制和状态检索的功能。
其中控制函数的名称都以 "action_" 为前缀,例如 `action_visit_url``action_click`。可以通过调用 `get_action_functions` 方法查看完整的函数列表。

```python
# 查看所有支持的操作
print(browser.get_action_functions())

# 访问新的网页
browser.action_visit_url("https://www.bing.com")
```

为了获取当前浏览器的状态,可以调用以 `"page_"` 为前缀的函数,例如 `page_url``page_title``page_html`

```python
# 当前网页的url
print(browser.page_url)

# 当前网页的标题
print(browser.page_title)

# 以 MarkDown 的格式获取当前的页面信息(通过markdownify进行解析)
print(browser.page_markdown)

# 当前网页的 html 源码(可能会太长)
print(browser.page_html)
```

Besides, to help vision models to understand the webpage better, we provide `set_interactive_marks` function,
which will mark all the interactive elements on the current webpage with index labels.
After calling `set_interactive_marks` function, more actions can be performed on the webpage.
For example, clicking a button, typing in a text box, etc.

此外,为了帮助视觉模型更好地理解网页,我们提供了 `set_interactive_marks` 函数,该函数会把当前网页上所有的可交互元素标记出来,并用序号标签进行标注(从0开始)。
调用 `set_interactive_marks` 函数标记网页后,我们就可以在网页上执行更多的操作,例如点击指定序号的按钮、在指定序号的文本框中进行输入等。

```python
# 为网页上的交互元素添加序号标签
browser.set_interactive_marks()

# 删除交互标记
# browser.remove_interactive_marks()
```

## 与智能体结合

上述的所有函数为交互式的 Web 浏览器控制提供了基本操作接口。开发者可以使用这些接口来构建自己的 Web 浏览代理。

In AgentScope, the web browser is also some kind of tool functions, so you can use it together with the service toolkit module to build your own agent.
We also provide a [web browser agent](https://github.com/modelscope/agentscope/tree/main/examples/conversation_with_web_browser_agent)) in our example.
You can refer to it for more details.

在 AgentScope 中,Web 浏览器也是一种工具函数,因此可以使用 `agentscope.service.ServiceToolkit` 来处理 `WebBrowser` 模块提供的函数,并构建自己的智能体。
我们在示例中提供了一个[Web 浏览器智能体](https://github.com/modelscope/agentscope/tree/main/examples/conversation_with_web_browser_agent)的样例。
可以参考该样例了解更多细节。

[[回到顶部]](#211-web-cn)
2 changes: 1 addition & 1 deletion examples/conversation_with_codeact_agent/codeact_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# pylint: disable=C0301
"""An agent class that implements the CodeAct agent.
This agent can execute code interactively as actions.
More details can be found at the paper of codeact agent
More details can be found at the paper of CodeAct agent
https://arxiv.org/abs/2402.01030
and the original repo of codeact https://github.com/xingyaoww/code-act
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ def execute_python_code(code: str) -> ServiceResponse: # pylint: disable=C0301
verbose=True,
service_toolkit=service_toolkit,
)
user = UserAgent(name="User")
user = UserAgent(name="User", input_hint="User Input ('exit' to quit): ")

# Build
x = None
Expand Down
4 changes: 2 additions & 2 deletions examples/conversation_with_swe-agent/swe_agent_prompts.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# -*- coding: utf-8 -*-
# pylint: disable=C0301
"""The SWE-agent relay heavily on it's prompts.
This file contains the neccessary prompts for the SWE-agent.
"""The SWE-agent relay heavily on its prompts.
This file contains the necessary prompts for the SWE-agent.
Some prompts are taken and modified from the original SWE-agent repo
or the SWE-agent implementation from Open-Devin.
"""
Expand Down
Loading

0 comments on commit 436b9de

Please sign in to comment.