Skip to content

Commit

Permalink
feat: rag fusion notebook (#3051)
Browse files Browse the repository at this point in the history
* feat: rag fusion notebook

* fix: updated notebook to run

* nit

* fix: proper md links

* chore: lint files

* nit

* nit

* fix: noteable link
  • Loading branch information
bracesproul committed Oct 26, 2023
1 parent 4def675 commit ee9a02d
Show file tree
Hide file tree
Showing 2 changed files with 349 additions and 1 deletion.
3 changes: 2 additions & 1 deletion cookbook/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@ Note that you will also need to install the Python `jupyter` package, and that t

Notebook | Description
:- | :-
[rewrite.ipynb](https://github.com/langchain-ai/langchainjs/tree/master/cookbook/rewrite.ipynb) | Handle real-world questions that contain extraneous, distracting information in your RAG chains by first rewriting them before performing retrieval.
[rewrite.ipynb](https://github.com/langchain-ai/langchainjs/tree/master/cookbook/rewrite.ipynb) | Handle real-world questions that contain extraneous, distracting information in your RAG chains by first rewriting them before performing retrieval.
[rag_fusion.ipynb](https://github.com/langchain-ai/langchainjs/tree/master/cookbook/rag_fusion.ipynb) | Turn user queries into more search friendly queries, then query a vector store and use reciprocal rank fusion to rank the results.
347 changes: 347 additions & 0 deletions cookbook/rag_fusion.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,347 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# RAG Fusion\n",
"You can also run this notebook online [at Noteable.io](https://app.noteable.io/published/d9902d51-c5e9-4d89-bcb1-f82521ab4497/rag_fusion)\n",
"A LangChain JS port of [this Github repo](https://github.com/Raudaschl/rag-fusion), all credit to the original author.\n",
"> RAG-Fusion, a search methodology that aims to bridge the gap between traditional search paradigms and the multifaceted dimensions of human queries. Inspired by the capabilities of Retrieval Augmented Generation (RAG), this project goes a step further by employing multiple query generation and Reciprocal Rank Fusion to re-rank search results."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"For this example we'll use an in memory store as our vector store/retriever, and some fake data.\n"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"// Deno.env.set(\"OPENAI_API_KEY\", \"\");\n",
"\n",
"import { OpenAIEmbeddings } from \"npm:langchain@0.0.172/embeddings/openai\";\n",
"import { MemoryVectorStore } from \"npm:langchain@0.0.172/vectorstores/memory\";"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"/** Define our fake data */\n",
"const allDocuments = [\n",
" { id: \"doc1\", text: \"Climate change and economic impact.\" },\n",
" { id: \"doc2\", text: \"Public health concerns due to climate change.\" },\n",
" { id: \"doc3\", text: \"Climate change: A social perspective.\" },\n",
" { id: \"doc4\", text: \"Technological solutions to climate change.\" },\n",
" { id: \"doc5\", text: \"Policy changes needed to combat climate change.\" },\n",
" { id: \"doc6\", text: \"Climate change and its impact on biodiversity.\" },\n",
" { id: \"doc7\", text: \"Climate change: The science and models.\" },\n",
" { id: \"doc8\", text: \"Global warming: A subset of climate change.\" },\n",
" { id: \"doc9\", text: \"How climate change affects daily weather.\" },\n",
" { id: \"doc10\", text: \"The history of climate change activism.\" },\n",
"];"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
"/** Initialize our vector store with the fake data and OpenAI embeddings. */\n",
"const vectorStore = await MemoryVectorStore.fromTexts(\n",
" allDocuments.map(({ text }) => text),\n",
" allDocuments.map(({ id }) => ({ id })),\n",
" new OpenAIEmbeddings()\n",
");\n",
"/** Create the retriever */\n",
"const retriever = vectorStore.asRetriever();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define the Query Generator\n",
"\n",
"We will now define a chain to do the query generation\n",
"This chain [pulls a prompt](https://smith.langchain.com/hub/langchain-ai/rag-fusion-query-generation) from the [LangChain Hub](https://smith.langchain.com/hub) that when provided a query, it tasks the model to generate multiple search queries related to the original. In our case, we're asking for 4 additional queries."
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [],
"source": [
"import { ChatOpenAI } from \"npm:langchain@0.0.172/chat_models/openai\";\n",
"import { pull } from \"npm:langchain@0.0.172/hub\";\n",
"import { StringOutputParser } from \"npm:langchain@0.0.172/schema/output_parser\";\n",
"import { RunnableLambda, RunnableSequence } from \"npm:langchain@0.0.172/schema/runnable\";"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"/** Define the chat model */\n",
"const model = new ChatOpenAI({\n",
" temperature: 0,\n",
" openAIApiKey: Deno.env.get(\"OPENAI_API_KEY\"),\n",
"});"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"/** Pull a prompt from the hub */\n",
"const prompt = await pull(\"langchain-ai/rag-fusion-query-generation\");\n",
"// const prompt = ChatPromptTemplate.fromMessages([\n",
"// [\"system\", \"You are a helpful assistant that generates multiple search queries based on a single input query.\"],\n",
"// [\"user\", \"Generate multiple search queries related to: {original_query}\"],\n",
"// [\"user\", \"OUTPUT (4 queries):\"],\n",
"// ]);"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [],
"source": [
"/** Define our chain for generating queries */\n",
"const generateQueries = RunnableSequence.from([\n",
" prompt,\n",
" model,\n",
" new StringOutputParser(),\n",
" RunnableLambda.from((output) => output.split(\"\\n\")),\n",
"]);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Construct the Reciprocal Rank Fusion function\n",
"This function is used for combining the results of multiple search queries to produce a single ranked list of results. This is a common technique in information retrieval known as data fusion or result merging."
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [],
"source": [
"import { Document } from \"npm:langchain@0.0.172/document\";"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [],
"source": [
"const reciprocalRankFusion = (results: Document[][], k = 60) => {\n",
" const fusedScores: Record<string, number> = {};\n",
" for (const result of results) {\n",
" // Assumes the docs are returned in sorted order of relevance\n",
" result.forEach((item, index) => {\n",
" const docString = item.pageContent;\n",
" if (!(docString in fusedScores)) {\n",
" fusedScores[docString] = 0;\n",
" }\n",
" fusedScores[docString] += 1 / (index + k);\n",
" });\n",
" }\n",
"\n",
" const rerankedResults = Object.entries(fusedScores)\n",
" .sort((a, b) => b[1] - a[1])\n",
" .map(\n",
" ([doc, score]) => new Document({ pageContent: doc, metadata: { score } })\n",
" );\n",
" return rerankedResults;\n",
"};"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define the full chain\n",
"Now we can put all our pieces together in one chain.\n",
"The chain preforms the following steps:\n",
"1. Generate 4 search queries based on the original query\n",
"2. Perform lookups with the retriever for each generated query\n",
"3. Pass the results of the vector store lookup to the `reciprocalRankFusion` function"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [],
"source": [
"const chain = RunnableSequence.from([\n",
" generateQueries,\n",
" retriever.map(),\n",
" reciprocalRankFusion,\n",
"]);"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[\n",
" Document {\n",
" pageContent: \"Climate change and economic impact.\",\n",
" metadata: { score: 0.06558258417063283 }\n",
" },\n",
" Document {\n",
" pageContent: \"Climate change: A social perspective.\",\n",
" metadata: { score: 0.06400409626216078 }\n",
" },\n",
" Document {\n",
" pageContent: \"How climate change affects daily weather.\",\n",
" metadata: { score: 0.04787506400409626 }\n",
" },\n",
" Document {\n",
" pageContent: \"Climate change and its impact on biodiversity.\",\n",
" metadata: { score: 0.03306010928961749 }\n",
" },\n",
" Document {\n",
" pageContent: \"Public health concerns due to climate change.\",\n",
" metadata: { score: 0.016666666666666666 }\n",
" },\n",
" Document {\n",
" pageContent: \"Technological solutions to climate change.\",\n",
" metadata: { score: 0.016666666666666666 }\n",
" },\n",
" Document {\n",
" pageContent: \"Policy changes needed to combat climate change.\",\n",
" metadata: { score: 0.01639344262295082 }\n",
" }\n",
"]\n"
]
}
],
"source": [
"const originalQuery = \"impact of climate change\";\n",
"\n",
"const result = await chain.invoke({\n",
" original_query: originalQuery,\n",
"});\n",
"\n",
"console.log(result);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result of the chain is the following:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[\n",
" \n",
" Document {\n",
"\n",
" pageContent: \"Climate change and economic impact.\",\n",
"\n",
" metadata: { score: 0.06558258417063283 }\n",
"\n",
" },\n",
"\n",
" Document {\n",
"\n",
" pageContent: \"Climate change: A social perspective.\",\n",
"\n",
" metadata: { score: 0.06400409626216078 }\n",
"\n",
" },\n",
" \n",
" Document {\n",
"\n",
" pageContent: \"How climate change affects daily weather.\",\n",
"\n",
" metadata: { score: 0.04787506400409626 }\n",
"\n",
" },\n",
"\n",
" Document {\n",
"\n",
" pageContent: \"Climate change and its impact on biodiversity.\",\n",
"\n",
" metadata: { score: 0.03306010928961749 }\n",
"\n",
" },\n",
" \n",
" Document {\n",
"\n",
" pageContent: \"Public health concerns due to climate change.\",\n",
"\n",
" metadata: { score: 0.016666666666666666 }\n",
"\n",
" },\n",
"\n",
" Document {\n",
"\n",
" pageContent: \"Technological solutions to climate change.\",\n",
"\n",
" metadata: { score: 0.016666666666666666 }\n",
"\n",
" },\n",
"\n",
" Document {\n",
"\n",
" pageContent: \"Policy changes needed to combat climate change.\",\n",
"\n",
" metadata: { score: 0.01639344262295082 }\n",
"\n",
" }\n",
" \n",
"]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Deno",
"language": "typescript",
"name": "deno"
},
"language_info": {
"file_extension": ".ts",
"mimetype": "text/x.typescript",
"name": "typescript",
"nb_converter": "script",
"pygments_lexer": "typescript",
"version": "5.2.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit ee9a02d

Please sign in to comment.