Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add RAG pipeline #6461

Merged
merged 12 commits into from
Dec 4, 2023
Merged

feat: Add RAG pipeline #6461

merged 12 commits into from
Dec 4, 2023

Conversation

ZanSara
Copy link
Contributor

@ZanSara ZanSara commented Nov 30, 2023

Why

The Pull Request is designed to introduce a new feature into the Haystack repository, specifically a "Retrieval-Augmented Generation" (RAG) pipeline construction utility. This feature aims to simplify the creation of RAG pipelines which leverage retrieval to enhance text generation tasks, such as answering questions based on a set of documents.

What

The changes include the creation of new Python files to define the utility and pipeline implementation, as well as the necessary updates to the release notes and tests to align with the new feature. The key changes are as follows:

  • A utility function named build_rag_pipeline has been added to construct a RAG pipeline using either BM25 or embedding-based retrieval.
  • A class called _RAGPipeline is defined to encapsulate the pipeline creation logic.
  • A release note is created to document the feature addition.
  • Test cases are added to validate the new RAG pipeline creation functionality.

How can it be used

Users can now call the build_rag_pipeline function, passing in an instance of InMemoryDocumentStore, a generation model, an optional prompt template, and an optional embedding model. If an embedding model is specified, embedding-based retrieval is used; otherwise, BM25-based retrieval is used by default. The pipeline can then be used to generate answers to queries based on the content of the documents in the provided document store.

Example usage:

from haystack.pipeline_utils.rag import build_rag_pipeline
pipeline = build_rag_pipeline(document_store=your_document_store_instance)
answer = pipeline.run(query="What's the capital of France?")

How did you test it

The testing involves unittests with mock objects and assertions. Some of the key assertions ensure that:

  • An Answer object is returned when running the pipeline.
  • A ValueError is raised when a document store other than InMemoryDocumentStore is used.
  • The text embedder component is excluded from the pipeline if no embedding model is specified.
  • The text embedder component is included if an embedding model is provided.

Notes for the reviewer

Carefully review the changes to ensure compatibility with existing pipeline utilities and document stores. Verify that the test cases cover critical use cases and edge scenarios. It is also important to confirm that the error handling is robust, especially when checking document store types and the presence of optional model parameters.

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels Nov 30, 2023
@vblagoje vblagoje marked this pull request as ready for review December 4, 2023 09:21
@vblagoje vblagoje requested review from a team as code owners December 4, 2023 09:21
@vblagoje vblagoje requested review from dfokina and anakin87 and removed request for a team December 4, 2023 09:21
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that build_rag_pipeline:

  • only supports InMemory Document Store
  • only supports OpenAI GPTGenerator
  • only supports Sentence Transformers Embedders (while build_indexing_pipeline also supports OpenAI Embedders. See feat: Add Indexing Pipeline  #6424)

However, if it's needed for the release, I can approve this PR. Please let me know...

from haystack.document_stores import InMemoryDocumentStore
from haystack.pipeline_utils import build_rag_pipeline

API_KEY = None # SET YOUR OPENAI API KEY HERE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
API_KEY = None # SET YOUR OPENAI API KEY HERE
API_KEY = "SET YOUR OPENAI API KEY HERE"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was thinking if they have OPENAI_API_KEY set by some chance this will work out of the box. But yes, I'll go with what you guys think is better

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
@anakin87 anakin87 self-requested a review December 4, 2023 14:21
@vblagoje vblagoje merged commit a38f871 into main Dec 4, 2023
19 checks passed
@vblagoje vblagoje deleted the rag_pipeline branch December 4, 2023 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants