-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add weaviate support #218
Add weaviate support #218
Conversation
@pseudotensor could you please review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks!
Just three things:
- Please provide an output of:
pytest -s -v tests/test_langchain_units.py::test_qa_wiki_db_chunk_hf_weaviate
To verify new test works. That should be minimal required thing.
I'll test the rest.
- I notice when running the test there is verbose output from weaviate.
Binary /home/jon/.cache/weaviate-embedded did not exist. Downloading binary from https://github.com/weaviate/weaviate/releases/download/v1.19.3/weaviate-v1.19.3-linux-amd64.tar.gz
Started /home/jon/.cache/weaviate-embedded: process ID 67378
{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2023-06-01T14:02:40-07:00"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2023-06-01T14:02:40-07:00"}
{"level":"warning","msg":"Multiple vector spaces are present, GraphQL Explore and REST API list objects endpoint module include params has been disabled as a result.","time":"2023-06-01T14:02:40-07:00"}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50051","time":"2023-06-01T14:02:40-07:00"}
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://127.0.0.1:6666","time":"2023-06-01T14:02:40-07:00"}
{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"langchain_92432e47dd174559a7ffaa9bba83fbaa_zo3rVHt6AaEy","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-06-01T14:02:42-07:00","took":41977}
('\n\nLinux is an open-source operating system that is free to use and modify, while Windows is a proprietary operating system that is owned by Microsoft Corporation.\n\nLinux is based on the Unix operating system, which is known for its stability and security, while Windows is based on the MS-DOS operating system, which is known for its ease of use.\n\nLinux is typically used for servers and other high-performance computing environments, while Windows is more commonly used for personal computers and laptops.\n\nLinux is also known for its flexibility and customizability, as users can modify the operating system to suit their needs, while Windows is more locked down and requires more user input to customize.\n\nOverall, the main differences between Linux and Windows are their licensing models, design principles, and target audience.\n\nSources [Score | Link]:<p><ul><li>0.61 | <a href="https://en.wikipedia.org/wiki/Linux" target="_blank" rel="noopener noreferrer">https://en.wikipedia.org/wiki/Linux</a></li></ul></p>End Sources<p>', '\nSources [Score | Link]:<p><ul><li>0.61 | <a href="https://en.wikipedia.org/wiki/Linux" target="_blank" rel="noopener noreferrer">https://en.wikipedia.org/wiki/Linux</a></li></ul></p>End Sources<p>')
Let's pass a verbose
through to the weaviate client or other places. I don't see that currently.
- Please update make_db to handle either Chroma or weaviate more naturally so not hardcoded chroma stuff in there. E.g. maybe add_to_db should handle case when db=None and load the db from persistent_directory. Or something like that. For your comment: Add support for weaviate #145 (comment)
For why db_type='chroma' was hardcoded itself in make_db, same reason, only persistent case was chroma before. You are extending that, so that has to be adjusted.
@achraf-mer we are getting contributors, would be great to have jenkins smoke going! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few changes, that the Author of PR also brought up.
For point 1:
What do you mean by "provide the output of the test"? Do you mean you want me to paste the output of the test in this PR's description? For point 2:
Are you okay with me introducing a new param called I also plan to add an option to connect to weaviate on a local docker instance too. For this method, the user needs to provide url, username and password (only the url is compulsory). I'm leaning towards configuring the connection methods using environment variables instead of adding additional params to the functions. What do you think? |
Hi, I just mean please run the test and show it pass etc. Yes, please add a new verbose parameter. For make_db, yes let's do minimal required work for now and un-hardcode things. Yes, for docker etc. can then add the remote client etc. In a different PR. |
@pseudotensor I've addressed your feedback. For point 2, it is not possible to silent the weaviate client. I've raised a feature request for this (see weaviate/weaviate-python-client#346 ) |
I'll fix that test, unrelated to this PR, and add make_db_main() tests in another PR. |
@hsm207 Thanks so much for your contribution!! |
Even pycharm showed problems, and |
Fixes #145
Output of running
pytest -s -v tests/test_langchain_units.py::test_qa_wiki_db_chunk_hf_weaviate
: