-
Notifications
You must be signed in to change notification settings - Fork 15.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add self query translator for weaviate vectorstore (#4804)
# Add self query translator for weaviate vectorstore Adds support for the EQ comparator and the AND/OR operators. Co-authored-by: Dominic Chan <dchan@cppib.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
- Loading branch information
1 parent
9928fb2
commit 6c60251
Showing
3 changed files
with
340 additions
and
1 deletion.
There are no files selected for viewing
277 changes: 277 additions & 0 deletions
277
docs/modules/indexes/retrievers/examples/weaviate_self_query.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,277 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "13afcae7", | ||
"metadata": {}, | ||
"source": [ | ||
"# Self-querying with Weaviate" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "68e75fb9", | ||
"metadata": {}, | ||
"source": [ | ||
"## Creating a Weaviate vectorstore\n", | ||
"First we'll want to create a Weaviate VectorStore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n", | ||
"\n", | ||
"NOTE: The self-query retriever requires you to have `lark` installed (`pip install lark`). We also need the `weaviate-client` package." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "63a8af5b", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"#!pip install lark weaviate-client" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 10, | ||
"id": "cb4a5787", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.schema import Document\n", | ||
"from langchain.embeddings.openai import OpenAIEmbeddings\n", | ||
"from langchain.vectorstores import Weaviate\n", | ||
"import os\n", | ||
"\n", | ||
"embeddings = OpenAIEmbeddings()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 22, | ||
"id": "bcbe04d9", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"docs = [\n", | ||
" Document(page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\", metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\"}),\n", | ||
" Document(page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\", metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2}),\n", | ||
" Document(page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\", metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6}),\n", | ||
" Document(page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\", metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3}),\n", | ||
" Document(page_content=\"Toys come alive and have a blast doing so\", metadata={\"year\": 1995, \"genre\": \"animated\"}),\n", | ||
" Document(page_content=\"Three men walk into the Zone, three men walk out of the Zone\", metadata={\"year\": 1979, \"rating\": 9.9, \"director\": \"Andrei Tarkovsky\", \"genre\": \"science fiction\", \"rating\": 9.9})\n", | ||
"]\n", | ||
"vectorstore = Weaviate.from_documents(\n", | ||
" docs, embeddings, weaviate_url=\"http://127.0.0.1:8080\"\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "5ecaab6d", | ||
"metadata": {}, | ||
"source": [ | ||
"## Creating our self-querying retriever\n", | ||
"Now we can instantiate our retriever. To do this we'll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 23, | ||
"id": "86e34dbf", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.llms import OpenAI\n", | ||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n", | ||
"from langchain.chains.query_constructor.base import AttributeInfo\n", | ||
"\n", | ||
"metadata_field_info=[\n", | ||
" AttributeInfo(\n", | ||
" name=\"genre\",\n", | ||
" description=\"The genre of the movie\", \n", | ||
" type=\"string or list[string]\", \n", | ||
" ),\n", | ||
" AttributeInfo(\n", | ||
" name=\"year\",\n", | ||
" description=\"The year the movie was released\", \n", | ||
" type=\"integer\", \n", | ||
" ),\n", | ||
" AttributeInfo(\n", | ||
" name=\"director\",\n", | ||
" description=\"The name of the movie director\", \n", | ||
" type=\"string\", \n", | ||
" ),\n", | ||
" AttributeInfo(\n", | ||
" name=\"rating\",\n", | ||
" description=\"A 1-10 rating for the movie\",\n", | ||
" type=\"float\"\n", | ||
" ),\n", | ||
"]\n", | ||
"document_content_description = \"Brief summary of a movie\"\n", | ||
"llm = OpenAI(temperature=0)\n", | ||
"retriever = SelfQueryRetriever.from_llm(llm, vectorstore, document_content_description, metadata_field_info, verbose=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ea9df8d4", | ||
"metadata": {}, | ||
"source": [ | ||
"## Testing it out\n", | ||
"And now we can try actually using our retriever!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 24, | ||
"id": "38a126e9", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"query='dinosaur' filter=None limit=None\n" | ||
] | ||
}, | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'genre': 'science fiction', 'rating': 7.7, 'year': 1993}),\n", | ||
" Document(page_content='Toys come alive and have a blast doing so', metadata={'genre': 'animated', 'rating': None, 'year': 1995}),\n", | ||
" Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'genre': 'science fiction', 'rating': 9.9, 'year': 1979}),\n", | ||
" Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'genre': None, 'rating': 8.6, 'year': 2006})]" | ||
] | ||
}, | ||
"execution_count": 24, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"# This example only specifies a relevant query\n", | ||
"retriever.get_relevant_documents(\"What are some movies about dinosaurs\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 26, | ||
"id": "b19d4da0", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"query='women' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='director', value='Greta Gerwig') limit=None\n" | ||
] | ||
}, | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'genre': None, 'rating': 8.3, 'year': 2019})]" | ||
] | ||
}, | ||
"execution_count": 26, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"# This example specifies a query and a filter\n", | ||
"retriever.get_relevant_documents(\"Has Greta Gerwig directed any movies about women\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "39bd1de1-b9fe-4a98-89da-58d8a7a6ae51", | ||
"metadata": {}, | ||
"source": [ | ||
"## Filter k\n", | ||
"\n", | ||
"We can also use the self query retriever to specify `k`: the number of documents to fetch.\n", | ||
"\n", | ||
"We can do this by passing `enable_limit=True` to the constructor." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 27, | ||
"id": "bff36b88-b506-4877-9c63-e5a1a8d78e64", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"retriever = SelfQueryRetriever.from_llm(\n", | ||
" llm, \n", | ||
" vectorstore, \n", | ||
" document_content_description, \n", | ||
" metadata_field_info, \n", | ||
" enable_limit=True,\n", | ||
" verbose=True\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 28, | ||
"id": "2758d229-4f97-499c-819f-888acaf8ee10", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"query='dinosaur' filter=None limit=2\n" | ||
] | ||
}, | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'genre': 'science fiction', 'rating': 7.7, 'year': 1993}),\n", | ||
" Document(page_content='Toys come alive and have a blast doing so', metadata={'genre': 'animated', 'rating': None, 'year': 1995})]" | ||
] | ||
}, | ||
"execution_count": 28, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"# This example only specifies a relevant query\n", | ||
"retriever.get_relevant_documents(\"what are two movies about dinosaurs\")" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.10" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
"""Logic for converting internal query language to a valid Weaviate query.""" | ||
from typing import Dict, Tuple, Union | ||
|
||
from langchain.chains.query_constructor.ir import ( | ||
Comparator, | ||
Comparison, | ||
Operation, | ||
Operator, | ||
StructuredQuery, | ||
Visitor, | ||
) | ||
|
||
|
||
class WeaviateTranslator(Visitor): | ||
"""Logic for converting internal query language elements to valid filters.""" | ||
|
||
allowed_operators = [Operator.AND, Operator.OR] | ||
"""Subset of allowed logical operators.""" | ||
|
||
allowed_comparators = [Comparator.EQ] | ||
|
||
def _map_func(self, func: Union[Operator, Comparator]) -> str: | ||
# https://weaviate.io/developers/weaviate/api/graphql/filters | ||
map_dict = {Operator.AND: "And", Operator.OR: "Or", Comparator.EQ: "Equal"} | ||
return map_dict[func] | ||
|
||
def _format_func(self, func: Union[Operator, Comparator]) -> str: | ||
if isinstance(func, Operator) and self.allowed_operators is not None: | ||
if func not in self.allowed_operators: | ||
raise ValueError( | ||
f"Received disallowed operator {func}. Allowed " | ||
f"comparators are {self.allowed_operators}" | ||
) | ||
if isinstance(func, Comparator) and self.allowed_comparators is not None: | ||
if func not in self.allowed_comparators: | ||
raise ValueError( | ||
f"Received disallowed comparator {func}. Allowed " | ||
f"comparators are {self.allowed_comparators}" | ||
) | ||
return self._map_func(func) | ||
|
||
def visit_operation(self, operation: Operation) -> Dict: | ||
args = [arg.accept(self) for arg in operation.arguments] | ||
return {"operator": self._format_func(operation.operator), "operands": args} | ||
|
||
def visit_comparison(self, comparison: Comparison) -> Dict: | ||
return { | ||
"path": [comparison.attribute], | ||
"operator": self._format_func(comparison.comparator), | ||
"valueText": comparison.value, | ||
} | ||
|
||
def visit_structured_query( | ||
self, structured_query: StructuredQuery | ||
) -> Tuple[str, dict]: | ||
if structured_query.filter is None: | ||
kwargs = {} | ||
else: | ||
kwargs = {"where_filter": structured_query.filter.accept(self)} | ||
return structured_query.query, kwargs |