Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support no chromadb, sentence_transformers or pypdf #556

Closed
wants to merge 4 commits into from

Conversation

thinkall
Copy link
Collaborator

@thinkall thinkall commented Nov 5, 2023

Why are these changes needed?

Address #531

Support no chromadb, sentence_transformers or pypdf when using customized vector database, embedding function or w/o pdf files, respectively.

Related issue number

Closes #531

Checks

@thinkall thinkall changed the title Support no chromadb Support no chromadb, sentence_transformers or pypdf Nov 5, 2023
@codecov-commenter
Copy link

codecov-commenter commented Nov 5, 2023

Codecov Report

Merging #556 (3f044de) into main (0dd0fc5) will increase coverage by 5.58%.
The diff coverage is 40.74%.

@@            Coverage Diff             @@
##             main     #556      +/-   ##
==========================================
+ Coverage   32.39%   37.97%   +5.58%     
==========================================
  Files          22       22              
  Lines        2772     2786      +14     
  Branches      630      632       +2     
==========================================
+ Hits          898     1058     +160     
+ Misses       1807     1640     -167     
- Partials       67       88      +21     
Flag Coverage Δ
unittests 37.90% <40.74%> (+5.58%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...gen/agentchat/contrib/retrieve_user_proxy_agent.py 14.21% <50.00%> (+14.21%) ⬆️
autogen/retrieve_utils.py 63.63% <36.84%> (+63.63%) ⬆️

... and 1 file with indirect coverage changes

@thinkall
Copy link
Collaborator Author

thinkall commented Nov 5, 2023

@kdcokenny could you check if this PR works for you? Thanks.

@thinkall thinkall requested a review from a team November 5, 2023 10:21
@0xmatt1
Copy link

0xmatt1 commented Nov 5, 2023

These things can be tricky, best to procede with caution

Copy link
Contributor

@sonichi sonichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we decouple [retrievechat] and [chromadb]? Is there any common dependency between different retrieve backends? If not, we may create [retrievechat_chromadb] etc.
It will be good to clarify first, about what is required for all retrieve chat, what is required for each specific backend, etc.

@@ -240,6 +254,8 @@ def create_vector_db_from_dir(
Returns:
API: the chromadb client.
"""
if not HAS_CHROMADB:
raise ImportError("Please install dependencies first. `pip install pyautogen[retrievechat]`")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the error a constant in the header.

@thinkall
Copy link
Collaborator Author

thinkall commented Nov 6, 2023

shall we decouple [retrievechat] and [chromadb]? Is there any common dependency between different retrieve backends? If not, we may create [retrievechat_chromadb] etc. It will be good to clarify first, about what is required for all retrieve chat, what is required for each specific backend, etc.

RetrieveChat has no must-install dependencies. The [retrievechat] extras could be treated as a default solution for running it, but chromadb, sentence_transformers and pypdf all have alternatives, ipython is not needed for pure QA. Advanced users can install different dependencies as they want.

What if we keep the current [retrievechat] extras, but improve the docs in the Installation section to guide users for different choices like qdrant for vectordb, huggingface or openai for embedding, unstructured for pdf/word... extraction? @sonichi @rickyloynd-microsoft

@kdcokenny
Copy link

kdcokenny commented Nov 6, 2023 via email

@kdcokenny
Copy link

@kdcokenny could you check if this PR works for you? Thanks.

When I was testing it I was getting weird errors like: TypeError: Completions.create() got an unexpected keyword argument 'request_timeout'

This may be due to the new openai v1 upgrade that I havent taken account for. Here's my exported py notebook:

# %%
# %pip install -q pyautogen qdrant-client fastembed
# %pip install -q chromadb pypdf # required by autogen for some reason

# %%
%pip install -q git+https://github.com/microsoft/autogen.git@support_no_chromadb
%pip install -q qdrant-client fastembed

# %%
from qdrant_client import QdrantClient

host = "qdran...ws.com"

client = QdrantClient(host=host, api_key="...")

# %%
config_list = [
    {
        'model': 'gpt-3.5-turbo',
        'api_key': '...',
    },
]

# %%
# import autogen

# autogen.ChatCompletion.start_logging()

# %%
client.set_model("BAAI/bge-base-en-v1.5")

# %%
from autogen.agentchat.contrib.qdrant_retrieve_user_proxy_agent import QdrantRetrieveUserProxyAgent

ragproxyagent = QdrantRetrieveUserProxyAgent(
    name="ragproxyagent",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    retrieve_config={
        "task": "qa",
        "chunk_token_size": 2000,
        "model": config_list[0]["model"],
        "client": client,
        "embedding_model": "BAAI/bge-base-en-v1.5",
        "collection_name": "wized_test",
    },
)

# %%
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent

assistant = RetrieveAssistantAgent(
    name="assistant", 
    system_message="You are a helpful assistant.",
    llm_config={
        "request_timeout": 600,
        "seed": 42,
        "config_list": config_list,
    },
)

# %%
assistant.reset()

qa_problem = "What is Wized?"
ragproxyagent.initiate_chat(assistant, problem=qa_problem)

@thinkall thinkall mentioned this pull request Nov 20, 2023
3 tasks
@thinkall thinkall closed this Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ChromaDB required even if you use QdrantRetrieveUserProxyAgent and not Chroma
5 participants