-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add html parser for RAG and some improvements #2271
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2271 +/- ##
===========================================
+ Coverage 38.39% 50.06% +11.67%
===========================================
Files 78 78
Lines 7808 7859 +51
Branches 1669 1818 +149
===========================================
+ Hits 2998 3935 +937
+ Misses 4560 3593 -967
- Partials 250 331 +81
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, I think we should consider sharing these utility functions across the core library rather than exclusively used by retrieval. For example, the parse HTML web pages can be built into a regular tool or user-defined function to be used by any ConversableAgent.
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Why are these changes needed?
retrieve_utils
(borrowed frombrowser_utils
)save_path
inget_file_from_url
, won't break whensave_path
is a directoryget_file_from_url
, won't break ifurl
is a broken oneoverlap
insplit_text_to_chunks
, not exposed to RAG agent yetRelated issue number
Checks