-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding web page digest function to service module #84
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Since there's no overlap in the processes for handling web pages via a model and a third-party library, they should be split into two separate methods like
bing_search
andgoogle_search
. The developer should decide which to use when choosing the service function. - Please see inline comments.
# Conflicts: # src/agentscope/service/text_processing/summarization.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see inline comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see inline comments, and solve the conflicts.
Ps:
We need to determine if we want developers to use the parse_html
function directly. If so:
- The arguments
keep_raw
andhtml_parse_func
inparse_html
function are meaningless for developers when they useparse_html
function directly. - The "return raw" and "parse html by customized function" operations should be handled within the
load_web
function.
# Conflicts: # src/agentscope/service/text_processing/summarization.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Adding web page digest function to service module
Description
As there are some recent internal requests about parsing the webpage, this PR introduce a webpage digestion service method to the framework.
If a LLM is provided as the
model
parameter, the webpage will first be split (by langchain_text_splitters.HTMLHeaderTextSplitter) and analyzed by LLM one by one.If there is no LLM provide, then the langchain_community.document_transformers.BeautifulSoupTransformer will be used to clean the webpage.
Checklist
Please check the following items before code is ready to be reviewed.