-
Notifications
You must be signed in to change notification settings - Fork 309
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Enable web browsing ability in agentscope (#296)
--------- Co-authored-by: wenhao <wenhao@U-J0P61NH2-2025.local> Co-authored-by: DavdGao <gaodawei.gdw@alibaba-inc.com>
- Loading branch information
1 parent
07165b3
commit 436b9de
Showing
21 changed files
with
1,431 additions
and
146 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
(211-web-en)= | ||
|
||
# Web Browser Control | ||
|
||
AgentScope supports web browser control with the `agentscope.service.WebBrowser` module. | ||
It allows agent to interact with web pages, and take actions like clicking, typing and scrolling. | ||
|
||
> Note the current web browser module requires a vision LLM to work properly. We will provide text-based vision in the future. | ||
> Note the web browser module is still in beta, which will be updated frequently. | ||
|
||
## Prerequisites | ||
|
||
The `WebBrowser` module is implemented based on [Playwright](https://playwright.dev/). | ||
You need to install the lasted AgentScope, as well as the playwright packages as follows: | ||
|
||
```bash | ||
# Install the latest AgentScope from source | ||
git clone https://github.com/modelscope/agentscope.git | ||
cd agentscope | ||
pip install -e . | ||
|
||
# Install playwright | ||
pip install playwright | ||
playwright install | ||
``` | ||
|
||
## Guidance | ||
|
||
Initialize the `WebBrowser` module as follows | ||
|
||
```python | ||
from agentscope.service import WebBrowser | ||
|
||
browser = WebBrowser() | ||
``` | ||
|
||
The `WebBrowser` module facilitates browser control and state retrieval. | ||
The name of the control functions are all prefixed by "action_", e.g. `action_visit_url`, | ||
and `action_click`. To see the full list of functions, calling the `get_action_functions` method. | ||
|
||
```python | ||
# To see full supported actions | ||
print(browser.get_action_functions()) | ||
|
||
# Visit a new webpage | ||
browser.action_visit_url("https://www.bing.com") | ||
``` | ||
|
||
To monitor the current state of the browser, you can call the function prefixed by `"page_"`, e.g. `page_url`, `page_title`, and `page_html`". | ||
|
||
```python | ||
# The url | ||
print(browser.page_url) | ||
|
||
# The page title | ||
print(browser.page_title) | ||
|
||
# The page in MarkDown format (parsed by markdownify) | ||
print(browser.page_markdown) | ||
|
||
# The page html (maybe too long) | ||
print(browser.page_html) | ||
``` | ||
|
||
Besides, to help vision models to understand the webpage better, we provide `set_interactive_marks` function, | ||
which will mark all the interactive elements on the current webpage with index labels. | ||
After calling `set_interactive_marks` function, more actions can be performed on the webpage. | ||
For example, clicking a button, typing in a text box, etc. | ||
|
||
```python | ||
# Set interactive marks with index labels | ||
browser.set_interactive_marks() | ||
|
||
# Remove interactive marks | ||
# browser.remove_interactive_marks() | ||
``` | ||
|
||
## Work with Agent | ||
|
||
The above functions provide basic operations for interactive web browser control. | ||
You can use them to build your own web browsing agent. | ||
|
||
In AgentScope, the web browser is also some kind of tool functions, so you can use it together with the service toolkit module to build your own agent. | ||
We also provide a [web browser agent](https://github.com/modelscope/agentscope/tree/main/examples/conversation_with_web_browser_agent)) in our example. | ||
You can refer to it for more details. | ||
|
||
|
||
[[Back to the top]](#211-web-en) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
(211-web-cn)= | ||
|
||
AgentScope 支持使用 `agentscope.service.WebBrowser` 模块进行 Web 浏览器控制。 | ||
它允许代理与网页进行交互,并执行点击、输入和滚动等网页操作。 | ||
|
||
> 注意当前的 Web 浏览器模块仍处于测试阶段,在未来的一段时间内将会频繁更新和优化。 | ||
## 预备 | ||
|
||
`WebBrowser` 模块基于 [Playwright](https://playwright.dev/) 实现,需要安装最新版本的 AgentScope 和 playwright 环境: | ||
|
||
```bash | ||
# 从源码安装最新版本的 AgentScope | ||
git clone https://github.com/modelscope/agentscope.git | ||
cd agentscope | ||
pip install -e . | ||
|
||
# 安装 playwright | ||
pip install playwright | ||
playwright install | ||
``` | ||
|
||
## Guidance | ||
|
||
通过以下方式初始化一个 `WebBrowser` 模块实例: | ||
|
||
```python | ||
from agentscope.service import WebBrowser | ||
|
||
browser = WebBrowser() | ||
``` | ||
|
||
The `WebBrowser` module facilitates browser control and state retrieval. | ||
The name of the control functions are all prefixed by "action_", e.g. `action_visit_url`, | ||
and `action_click`. To see the full list of functions, calling the `get_action_functions` method. | ||
|
||
`WebBrowser` 模块提供了浏览器控制和状态检索的功能。 | ||
其中控制函数的名称都以 "action_" 为前缀,例如 `action_visit_url` 和 `action_click`。可以通过调用 `get_action_functions` 方法查看完整的函数列表。 | ||
|
||
```python | ||
# 查看所有支持的操作 | ||
print(browser.get_action_functions()) | ||
|
||
# 访问新的网页 | ||
browser.action_visit_url("https://www.bing.com") | ||
``` | ||
|
||
为了获取当前浏览器的状态,可以调用以 `"page_"` 为前缀的函数,例如 `page_url`、`page_title` 和 `page_html`。 | ||
|
||
```python | ||
# 当前网页的url | ||
print(browser.page_url) | ||
|
||
# 当前网页的标题 | ||
print(browser.page_title) | ||
|
||
# 以 MarkDown 的格式获取当前的页面信息(通过markdownify进行解析) | ||
print(browser.page_markdown) | ||
|
||
# 当前网页的 html 源码(可能会太长) | ||
print(browser.page_html) | ||
``` | ||
|
||
Besides, to help vision models to understand the webpage better, we provide `set_interactive_marks` function, | ||
which will mark all the interactive elements on the current webpage with index labels. | ||
After calling `set_interactive_marks` function, more actions can be performed on the webpage. | ||
For example, clicking a button, typing in a text box, etc. | ||
|
||
此外,为了帮助视觉模型更好地理解网页,我们提供了 `set_interactive_marks` 函数,该函数会把当前网页上所有的可交互元素标记出来,并用序号标签进行标注(从0开始)。 | ||
调用 `set_interactive_marks` 函数标记网页后,我们就可以在网页上执行更多的操作,例如点击指定序号的按钮、在指定序号的文本框中进行输入等。 | ||
|
||
```python | ||
# 为网页上的交互元素添加序号标签 | ||
browser.set_interactive_marks() | ||
|
||
# 删除交互标记 | ||
# browser.remove_interactive_marks() | ||
``` | ||
|
||
## 与智能体结合 | ||
|
||
上述的所有函数为交互式的 Web 浏览器控制提供了基本操作接口。开发者可以使用这些接口来构建自己的 Web 浏览代理。 | ||
|
||
In AgentScope, the web browser is also some kind of tool functions, so you can use it together with the service toolkit module to build your own agent. | ||
We also provide a [web browser agent](https://github.com/modelscope/agentscope/tree/main/examples/conversation_with_web_browser_agent)) in our example. | ||
You can refer to it for more details. | ||
|
||
在 AgentScope 中,Web 浏览器也是一种工具函数,因此可以使用 `agentscope.service.ServiceToolkit` 来处理 `WebBrowser` 模块提供的函数,并构建自己的智能体。 | ||
我们在示例中提供了一个[Web 浏览器智能体](https://github.com/modelscope/agentscope/tree/main/examples/conversation_with_web_browser_agent)的样例。 | ||
可以参考该样例了解更多细节。 | ||
|
||
[[回到顶部]](#211-web-cn) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.