feat: Add RAG pipeline #6461

ZanSara · 2023-11-30T14:55:33Z

Why

The Pull Request is designed to introduce a new feature into the Haystack repository, specifically a "Retrieval-Augmented Generation" (RAG) pipeline construction utility. This feature aims to simplify the creation of RAG pipelines which leverage retrieval to enhance text generation tasks, such as answering questions based on a set of documents.

What

The changes include the creation of new Python files to define the utility and pipeline implementation, as well as the necessary updates to the release notes and tests to align with the new feature. The key changes are as follows:

A utility function named build_rag_pipeline has been added to construct a RAG pipeline using either BM25 or embedding-based retrieval.
A class called _RAGPipeline is defined to encapsulate the pipeline creation logic.
A release note is created to document the feature addition.
Test cases are added to validate the new RAG pipeline creation functionality.

How can it be used

Users can now call the build_rag_pipeline function, passing in an instance of InMemoryDocumentStore, a generation model, an optional prompt template, and an optional embedding model. If an embedding model is specified, embedding-based retrieval is used; otherwise, BM25-based retrieval is used by default. The pipeline can then be used to generate answers to queries based on the content of the documents in the provided document store.

Example usage:

from haystack.pipeline_utils.rag import build_rag_pipeline
pipeline = build_rag_pipeline(document_store=your_document_store_instance)
answer = pipeline.run(query="What's the capital of France?")

How did you test it

The testing involves unittests with mock objects and assertions. Some of the key assertions ensure that:

An Answer object is returned when running the pipeline.
A ValueError is raised when a document store other than InMemoryDocumentStore is used.
The text embedder component is excluded from the pipeline if no embedding model is specified.
The text embedder component is included if an embedding model is provided.

Notes for the reviewer

Carefully review the changes to ensure compatibility with existing pipeline utilities and document stores. Verify that the test cases cover critical use cases and edge scenarios. It is also important to confirm that the error handling is robust, especially when checking document store types and the presence of optional model parameters.

anakin87

I noticed that build_rag_pipeline:

only supports InMemory Document Store
only supports OpenAI GPTGenerator
only supports Sentence Transformers Embedders (while build_indexing_pipeline also supports OpenAI Embedders. See feat: Add Indexing Pipeline #6424)

However, if it's needed for the release, I can approve this PR. Please let me know...

masci · 2023-12-04T14:02:48Z

examples/getting_started/rag.py

+from haystack.document_stores import InMemoryDocumentStore
+from haystack.pipeline_utils import build_rag_pipeline
+
+API_KEY = None  # SET YOUR OPENAI API KEY HERE


Suggested change

API_KEY = None # SET YOUR OPENAI API KEY HERE

API_KEY = "SET YOUR OPENAI API KEY HERE"

Ok, I was thinking if they have OPENAI_API_KEY set by some chance this will work out of the box. But yes, I'll go with what you guys think is better

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

ZanSara added 2 commits November 30, 2023 15:53

add rag pipeline

fdad616

pipeline_utils

a4dec8b

github-actions bot added topic:tests type:documentation Improvements on the docs labels Nov 30, 2023

ZanSara and others added 4 commits November 30, 2023 16:11

typo

ef01154

reno

c69c7b6

Merge branch 'main' into rag_pipeline

5985f93

Merge branch 'main' into rag_pipeline

a1e5db7

vblagoje marked this pull request as ready for review December 4, 2023 09:21

vblagoje requested review from a team as code owners December 4, 2023 09:21

vblagoje requested review from dfokina and anakin87 and removed request for a team December 4, 2023 09:21

vblagoje added 5 commits December 4, 2023 10:43

Pydoc spelling

6ce1f2d

Add getting started example, use given document_store reference

77298ea

Small fixes

ff182de

Make setting OPENAI_API_KEY obvious

18cbccf

Simpler api key setup

a244094

anakin87 reviewed Dec 4, 2023

View reviewed changes

masci reviewed Dec 4, 2023

View reviewed changes

Update examples/getting_started/rag.py

2fa421e

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

anakin87 self-requested a review December 4, 2023 14:21

anakin87 approved these changes Dec 4, 2023

View reviewed changes

vblagoje merged commit a38f871 into main Dec 4, 2023
19 checks passed

vblagoje deleted the rag_pipeline branch December 4, 2023 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add RAG pipeline #6461

feat: Add RAG pipeline #6461

ZanSara commented Nov 30, 2023 •

edited by vblagoje

Loading

anakin87 left a comment

masci Dec 4, 2023

vblagoje Dec 4, 2023

	API_KEY = None # SET YOUR OPENAI API KEY HERE
	API_KEY = "SET YOUR OPENAI API KEY HERE"

feat: Add RAG pipeline #6461

feat: Add RAG pipeline #6461

Conversation

ZanSara commented Nov 30, 2023 • edited by vblagoje Loading

Why

What

How can it be used

How did you test it

Notes for the reviewer

anakin87 left a comment

Choose a reason for hiding this comment

masci Dec 4, 2023

Choose a reason for hiding this comment

vblagoje Dec 4, 2023

Choose a reason for hiding this comment

ZanSara commented Nov 30, 2023 •

edited by vblagoje

Loading