Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional docs in retrieval agent if required #1028

Merged
merged 33 commits into from
Dec 25, 2023

Conversation

ShobhitVishnoi30
Copy link
Collaborator

Why are these changes needed?

If you have existing documents in the retriever agent instance, and you want to add more documents using the current implementation, it currently replaces the existing ones because the document IDs always start with "doc_0."

To address this, we've updated the functionality to allow assigning different IDs if the user wishes to add additional documents.I've incorporated an "extra_docs" configuration within the retrieval settings. When set to true, the system will assign IDs to the chunks starting from "length+i" when inserting the document into the Chroma database. This modification ensures that existing documents are not replaced, allowing for the addition of more documents to the same collection.

Related issue number

issue

Checks

@codecov-commenter
Copy link

codecov-commenter commented Dec 20, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (70cc1f4) 30.27% compared to head (02c46c3) 53.76%.

Files Patch % Lines
...ntchat/contrib/qdrant_retrieve_user_proxy_agent.py 71.42% 1 Missing and 1 partial ⚠️
autogen/retrieve_utils.py 33.33% 1 Missing and 1 partial ⚠️
...gen/agentchat/contrib/retrieve_user_proxy_agent.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1028       +/-   ##
===========================================
+ Coverage   30.27%   53.76%   +23.48%     
===========================================
  Files          30       30               
  Lines        3980     3988        +8     
  Branches      897      948       +51     
===========================================
+ Hits         1205     2144      +939     
+ Misses       2696     1648     -1048     
- Partials       79      196      +117     
Flag Coverage Δ
unittests 53.61% <54.54%> (+23.38%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ShobhitVishnoi30
Copy link
Collaborator Author

ShobhitVishnoi30 commented Dec 20, 2023

@sonichi @victordibia @elecnix @halr9000 @joshuavial please review it

@qingyun-wu qingyun-wu added this pull request to the merge queue Dec 25, 2023
Merged via the queue into microsoft:main with commit ebd5de9 Dec 25, 2023
81 of 84 checks passed
whiskyboy pushed a commit to whiskyboy/autogen that referenced this pull request Apr 17, 2024
* Update conversable_agent.py

* Add files via upload

* Delete notebook/Async_human_input.ipynb

* Add files via upload

* refactor:formatter

* feat:updated position

* Update dbutils.py

* added feature to add docs in retrieve

* Update dbutils.py

* Update retrieve_user_proxy_agent.py

* Update retrieve_utils.py

* Update qdrant_retrieve_user_proxy_agent.py

* Update qdrant_retrieve_user_proxy_agent.py

* feat:fixed pre commit issue

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: svrapidinnovation <sv@rapidinnovation.dev>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants