Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: rag fusion notebook #3051

Merged
merged 8 commits into from
Oct 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion cookbook/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@ Note that you will also need to install the Python `jupyter` package, and that t

Notebook | Description
:- | :-
[rewrite.ipynb](https://github.com/langchain-ai/langchainjs/tree/master/cookbook/rewrite.ipynb) | Handle real-world questions that contain extraneous, distracting information in your RAG chains by first rewriting them before performing retrieval.
[rewrite.ipynb](https://github.com/langchain-ai/langchainjs/tree/master/cookbook/rewrite.ipynb) | Handle real-world questions that contain extraneous, distracting information in your RAG chains by first rewriting them before performing retrieval.
[rag_fusion.ipynb](https://github.com/langchain-ai/langchainjs/tree/master/cookbook/rag_fusion.ipynb) | Turn user queries into more search friendly queries, then query a vector store and use reciprocal rank fusion to rank the results.
347 changes: 347 additions & 0 deletions cookbook/rag_fusion.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,347 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# RAG Fusion\n",
"You can also run this notebook online [at Noteable.io](https://app.noteable.io/published/d9902d51-c5e9-4d89-bcb1-f82521ab4497/rag_fusion)\n",
"A LangChain JS port of [this Github repo](https://github.com/Raudaschl/rag-fusion), all credit to the original author.\n",
"> RAG-Fusion, a search methodology that aims to bridge the gap between traditional search paradigms and the multifaceted dimensions of human queries. Inspired by the capabilities of Retrieval Augmented Generation (RAG), this project goes a step further by employing multiple query generation and Reciprocal Rank Fusion to re-rank search results."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"For this example we'll use an in memory store as our vector store/retriever, and some fake data.\n"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"// Deno.env.set(\"OPENAI_API_KEY\", \"\");\n",
"\n",
"import { OpenAIEmbeddings } from \"npm:langchain@0.0.172/embeddings/openai\";\n",
"import { MemoryVectorStore } from \"npm:langchain@0.0.172/vectorstores/memory\";"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"/** Define our fake data */\n",
"const allDocuments = [\n",
" { id: \"doc1\", text: \"Climate change and economic impact.\" },\n",
" { id: \"doc2\", text: \"Public health concerns due to climate change.\" },\n",
" { id: \"doc3\", text: \"Climate change: A social perspective.\" },\n",
" { id: \"doc4\", text: \"Technological solutions to climate change.\" },\n",
" { id: \"doc5\", text: \"Policy changes needed to combat climate change.\" },\n",
" { id: \"doc6\", text: \"Climate change and its impact on biodiversity.\" },\n",
" { id: \"doc7\", text: \"Climate change: The science and models.\" },\n",
" { id: \"doc8\", text: \"Global warming: A subset of climate change.\" },\n",
" { id: \"doc9\", text: \"How climate change affects daily weather.\" },\n",
" { id: \"doc10\", text: \"The history of climate change activism.\" },\n",
"];"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
"/** Initialize our vector store with the fake data and OpenAI embeddings. */\n",
"const vectorStore = await MemoryVectorStore.fromTexts(\n",
" allDocuments.map(({ text }) => text),\n",
" allDocuments.map(({ id }) => ({ id })),\n",
" new OpenAIEmbeddings()\n",
");\n",
"/** Create the retriever */\n",
"const retriever = vectorStore.asRetriever();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define the Query Generator\n",
"\n",
"We will now define a chain to do the query generation\n",
"This chain [pulls a prompt](https://smith.langchain.com/hub/langchain-ai/rag-fusion-query-generation) from the [LangChain Hub](https://smith.langchain.com/hub) that when provided a query, it tasks the model to generate multiple search queries related to the original. In our case, we're asking for 4 additional queries."
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [],
"source": [
"import { ChatOpenAI } from \"npm:langchain@0.0.172/chat_models/openai\";\n",
"import { pull } from \"npm:langchain@0.0.172/hub\";\n",
"import { StringOutputParser } from \"npm:langchain@0.0.172/schema/output_parser\";\n",
"import { RunnableLambda, RunnableSequence } from \"npm:langchain@0.0.172/schema/runnable\";"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"/** Define the chat model */\n",
"const model = new ChatOpenAI({\n",
" temperature: 0,\n",
" openAIApiKey: Deno.env.get(\"OPENAI_API_KEY\"),\n",
"});"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"/** Pull a prompt from the hub */\n",
"const prompt = await pull(\"langchain-ai/rag-fusion-query-generation\");\n",
"// const prompt = ChatPromptTemplate.fromMessages([\n",
"// [\"system\", \"You are a helpful assistant that generates multiple search queries based on a single input query.\"],\n",
"// [\"user\", \"Generate multiple search queries related to: {original_query}\"],\n",
"// [\"user\", \"OUTPUT (4 queries):\"],\n",
"// ]);"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [],
"source": [
"/** Define our chain for generating queries */\n",
"const generateQueries = RunnableSequence.from([\n",
" prompt,\n",
" model,\n",
" new StringOutputParser(),\n",
" RunnableLambda.from((output) => output.split(\"\\n\")),\n",
"]);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Construct the Reciprocal Rank Fusion function\n",
"This function is used for combining the results of multiple search queries to produce a single ranked list of results. This is a common technique in information retrieval known as data fusion or result merging."
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [],
"source": [
"import { Document } from \"npm:langchain@0.0.172/document\";"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [],
"source": [
"const reciprocalRankFusion = (results: Document[][], k = 60) => {\n",
" const fusedScores: Record<string, number> = {};\n",
" for (const result of results) {\n",
" // Assumes the docs are returned in sorted order of relevance\n",
" result.forEach((item, index) => {\n",
" const docString = item.pageContent;\n",
" if (!(docString in fusedScores)) {\n",
" fusedScores[docString] = 0;\n",
" }\n",
" fusedScores[docString] += 1 / (index + k);\n",
" });\n",
" }\n",
"\n",
" const rerankedResults = Object.entries(fusedScores)\n",
" .sort((a, b) => b[1] - a[1])\n",
" .map(\n",
" ([doc, score]) => new Document({ pageContent: doc, metadata: { score } })\n",
" );\n",
" return rerankedResults;\n",
"};"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define the full chain\n",
"Now we can put all our pieces together in one chain.\n",
"The chain preforms the following steps:\n",
"1. Generate 4 search queries based on the original query\n",
"2. Perform lookups with the retriever for each generated query\n",
"3. Pass the results of the vector store lookup to the `reciprocalRankFusion` function"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [],
"source": [
"const chain = RunnableSequence.from([\n",
" generateQueries,\n",
" retriever.map(),\n",
bracesproul marked this conversation as resolved.
Show resolved Hide resolved
" reciprocalRankFusion,\n",
"]);"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[\n",
" Document {\n",
" pageContent: \"Climate change and economic impact.\",\n",
" metadata: { score: 0.06558258417063283 }\n",
" },\n",
" Document {\n",
" pageContent: \"Climate change: A social perspective.\",\n",
" metadata: { score: 0.06400409626216078 }\n",
" },\n",
" Document {\n",
" pageContent: \"How climate change affects daily weather.\",\n",
" metadata: { score: 0.04787506400409626 }\n",
" },\n",
" Document {\n",
" pageContent: \"Climate change and its impact on biodiversity.\",\n",
" metadata: { score: 0.03306010928961749 }\n",
" },\n",
" Document {\n",
" pageContent: \"Public health concerns due to climate change.\",\n",
" metadata: { score: 0.016666666666666666 }\n",
" },\n",
" Document {\n",
" pageContent: \"Technological solutions to climate change.\",\n",
" metadata: { score: 0.016666666666666666 }\n",
" },\n",
" Document {\n",
" pageContent: \"Policy changes needed to combat climate change.\",\n",
" metadata: { score: 0.01639344262295082 }\n",
" }\n",
"]\n"
]
}
],
"source": [
"const originalQuery = \"impact of climate change\";\n",
"\n",
"const result = await chain.invoke({\n",
" original_query: originalQuery,\n",
"});\n",
"\n",
"console.log(result);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result of the chain is the following:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[\n",
" \n",
" Document {\n",
"\n",
" pageContent: \"Climate change and economic impact.\",\n",
"\n",
" metadata: { score: 0.06558258417063283 }\n",
"\n",
" },\n",
"\n",
" Document {\n",
"\n",
" pageContent: \"Climate change: A social perspective.\",\n",
"\n",
" metadata: { score: 0.06400409626216078 }\n",
"\n",
" },\n",
" \n",
" Document {\n",
"\n",
" pageContent: \"How climate change affects daily weather.\",\n",
"\n",
" metadata: { score: 0.04787506400409626 }\n",
"\n",
" },\n",
"\n",
" Document {\n",
"\n",
" pageContent: \"Climate change and its impact on biodiversity.\",\n",
"\n",
" metadata: { score: 0.03306010928961749 }\n",
"\n",
" },\n",
" \n",
" Document {\n",
"\n",
" pageContent: \"Public health concerns due to climate change.\",\n",
"\n",
" metadata: { score: 0.016666666666666666 }\n",
"\n",
" },\n",
"\n",
" Document {\n",
"\n",
" pageContent: \"Technological solutions to climate change.\",\n",
"\n",
" metadata: { score: 0.016666666666666666 }\n",
"\n",
" },\n",
"\n",
" Document {\n",
"\n",
" pageContent: \"Policy changes needed to combat climate change.\",\n",
"\n",
" metadata: { score: 0.01639344262295082 }\n",
"\n",
" }\n",
" \n",
"]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Deno",
"language": "typescript",
"name": "deno"
},
"language_info": {
"file_extension": ".ts",
"mimetype": "text/x.typescript",
"name": "typescript",
"nb_converter": "script",
"pygments_lexer": "typescript",
"version": "5.2.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}