Skip to content

Commit

Permalink
[docs]: standardize vectorstores (#24797)
Browse files Browse the repository at this point in the history
  • Loading branch information
isahers1 authored Jul 30, 2024
1 parent ac64980 commit 5112422
Show file tree
Hide file tree
Showing 3 changed files with 388 additions and 20 deletions.
298 changes: 284 additions & 14 deletions libs/cli/langchain_cli/integration_template/docs/vectorstores.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -17,35 +17,88 @@
"source": [
"# __ModuleName__VectorStore\n",
"\n",
"This notebook covers how to get started with the __ModuleName__ vector store.\n",
"This notebook covers how to get started with the __ModuleName__ vector store."
]
},
{
"cell_type": "markdown",
"id": "36fdc060",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"- TODO: Update with relevant info.\n",
"- TODO: Update minimum version to be correct.\n",
"\n",
"To access __ModuleName__ vector stores you'll need to create a/an __ModuleName__ account, get an API key, and install the `__package_name__` integration package."
]
},
{
"cell_type": "raw",
"id": "64e28aa6",
"metadata": {
"vscode": {
"languageId": "raw"
}
},
"source": [
"%pip install -qU \"__package_name__>=MINIMUM_VERSION\""
]
},
{
"cell_type": "markdown",
"id": "9695dee7",
"metadata": {},
"source": [
"### Credentials\n",
"\n",
"- TODO: Update with relevant info.\n",
"\n",
"## Installation"
"Head to (TODO: link) to sign up to __ModuleName__ and generate an API key. Once you've done this set the __MODULE_NAME___API_KEY environment variable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d97b55c2",
"id": "894c30e4",
"metadata": {},
"outputs": [],
"source": [
"# install package\n",
"!pip install -U __package_name__"
"import getpass\n",
"import os\n",
"\n",
"import os\n",
"\n",
"if not os.getenv(\"__MODULE_NAME___API_KEY\"):\n",
" import getpass\n",
" os.environ[\"__MODULE_NAME___API_KEY\"] = getpass.getpass(\"Enter your __ModuleName__ API key: \")"
]
},
{
"cell_type": "markdown",
"id": "36fdc060",
"metadata": {},
"source": [
"## Environment Setup\n",
"\n",
"Make sure to set the following environment variables:\n",
"\n",
"- TODO: fill out relevant environment variables or secrets\n",
"- Op\n",
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"id": "93df377e",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"## Usage"
"- TODO: Fill out with relevant init params"
]
},
{
Expand All @@ -59,7 +112,224 @@
"source": [
"from __module_name__.vectorstores import __ModuleName__VectorStore\n",
"\n",
"# TODO: switch for preferred way to init and use your vector store\n"
"vector_store = __ModuleName__VectorStore()"
]
},
{
"cell_type": "markdown",
"id": "ac6071d4",
"metadata": {},
"source": [
"## Manage vector store\n",
"\n",
"### Add items to vector store\n",
"\n",
"- TODO: Edit and then run code cell to generate output"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "17f5efc0",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.documents import Document\n",
"\n",
"document_1 = Document(\n",
" page_content=\"foo\",\n",
" metadata={\"source\": \"https://example.com\"}\n",
")\n",
"\n",
"document_2 = Document(\n",
" page_content=\"bar\",\n",
" metadata={\"source\": \"https://example.com\"}\n",
")\n",
"\n",
"document_2 = Document(\n",
" page_content=\"baz\",\n",
" metadata={\"source\": \"https://example.com\"}\n",
")\n",
"\n",
"documents = [document_1, document_2]\n",
"\n",
"vector_store.add_documents(documents=documents,ids=[\"1\",\"2\"])"
]
},
{
"cell_type": "markdown",
"id": "c738c3e0",
"metadata": {},
"source": [
"### Update items in vector store\n",
"\n",
"- TODO: Edit and then run code cell to generate output"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f0aa8b71",
"metadata": {},
"outputs": [],
"source": [
"updated_document = Document(\n",
" page_content=\"qux\",\n",
" metadata={\"source\": \"https://another-example.com\"}\n",
")\n",
"\n",
"vector_store.update_documents(document_id=\"1\",document=updated_document)"
]
},
{
"cell_type": "markdown",
"id": "dcf1b905",
"metadata": {},
"source": [
"### Delete items from vector store\n",
"\n",
"- TODO: Edit and then run code cell to generate output"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef61e188",
"metadata": {},
"outputs": [],
"source": [
"vector_store.delete(ids=[\"3\"])"
]
},
{
"cell_type": "markdown",
"id": "c3620501",
"metadata": {},
"source": [
"## Query vector store\n",
"\n",
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
"\n",
"### Query directly\n",
"\n",
"Performing a simple similarity search can be done as follows:\n",
"\n",
"- TODO: Edit and then run code cell to generate output"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa0a16fa",
"metadata": {},
"outputs": [],
"source": [
"results = vector_store.similarity_search(query=\"thud\",k=1,filter={\"source\":\"https://example.com\"})\n",
"for doc in results:\n",
" print(f\"* {doc.page_content} [{doc.metadata}]\")"
]
},
{
"cell_type": "markdown",
"id": "3ed9d733",
"metadata": {},
"source": [
"If you want to execute a similarity search and receive the corresponding scores you can run:\n",
"\n",
"- TODO: Edit and then run code cell to generate output"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5efd2eaa",
"metadata": {},
"outputs": [],
"source": [
"results = vector_store.similarity_search_with_score(query=\"thud\",k=1,filter={\"source\":\"https://example.com\"})\n",
"for doc, score in results:\n",
" print(f\"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]\")"
]
},
{
"cell_type": "markdown",
"id": "0c235cdc",
"metadata": {},
"source": [
"### Query by turning into retriever\n",
"\n",
"You can also transform the vector store into a retriever for easier usage in your chains. \n",
"\n",
"- TODO: Edit and then run code cell to generate output"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3460093",
"metadata": {},
"outputs": [],
"source": [
"retriever = vector_store.as_retriever()\n",
"retriever.invoke(\"thud\")"
]
},
{
"cell_type": "markdown",
"id": "901c75dc",
"metadata": {},
"source": [
"Using retriever in a simple RAG chain:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "619b5ef6",
"metadata": {},
"outputs": [],
"source": [
"from langchain_openai import ChatOpenAI\n",
"from langchain import hub\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0125\")\n",
"\n",
"prompt = hub.pull(\"rlm/rag-prompt\")\n",
"\n",
"def format_docs(docs):\n",
" return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
"\n",
"rag_chain = (\n",
" {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
" | prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")\n",
"\n",
"rag_chain.invoke(\"thud\")"
]
},
{
"cell_type": "markdown",
"id": "069f1b5f",
"metadata": {},
"source": [
"## TODO: Any functionality specific to this vector store\n",
"\n",
"E.g. creating a persisten database to save to your disk, etc."
]
},
{
"cell_type": "markdown",
"id": "8a27244f",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all __ModuleName__VectorStore features and configurations head to the API reference: https://api.python.langchain.com/en/latest/vectorstores/__module_name__.vectorstores.__ModuleName__VectorStore.html"
]
}
],
Expand Down
Loading

0 comments on commit 5112422

Please sign in to comment.