RAG Suite: Multimodal, Agentic, and Graph-Based Retrieval-Augmented Generation

Overview

This repository focuses on the development of three distinct types of Retrieval-Augmented Generation (RAG) systems, designed to explore and push the boundaries of retrieval-augmented AI. These systems enhance language model capabilities by integrating efficient retrieval mechanisms with generative models.

Current Progress

Multimodal RAG: ✅ Completed
- Combines textual and visual data for retrieval and generation.
- Example Use Case: Extracting information from documents with mixed content such as text, images, and tables.
Agentic RAG: 🚧 Work in Progress
- Introduces agent-like behavior for decision-making and task execution.
- Planned Use Case: Autonomous EDA (Exploratory Data Analysis) and predictive model generation for datasets.
Graph-Based RAG: 🚧 Work in Progress
- Incorporates graph-based data structures for entity-centric retrieval.
- Planned Use Case: Knowledge graph-based question answering and entity relationship search.

Multimodal RAG

Overview

The Multimodal Retrieval-Augmented Generation (RAG) system combines retrieval and generation capabilities to process and generate insights from complex, multi-format data such as PDFs containing text, images, and tables.

Tech Stack

Unstructured: For PDF parsing and data extraction.
LangChain: To build the retrieval and generation pipeline.
Google Gemini API: For summarisation and response generation for user queries.
HuggingFace: Embedding model for semantic search.
ChromaDB: Vector database for similarity-based retrieval.
Gradio: Interactive user interface for uploading PDFs and viewing results.
LangSmith: For tracing and observability.

How It Works

PDF Upload: Users upload a PDF through the Gradio interface.
Parsing: The Unstructured library extracts text, images, and tables from the document.
Summarising: The text and tables are summarised and the images are described using the LLM.
Embedding Generation: Text embeddings for the summaries and the descriptions are generated using the embedding model.
Storing: The original chunks are stored in the Document Store and the summarised chunks are stored in the Vector DB, with the same UUID.
Vector Search: The data is retrieved using semantic search for the user query.
Retrieval & Generation: Relevant sections of the PDF are retrieved and processed by LangChain using the Google Gemini API for contextual responses.
Output: Results are displayed interactively through the Gradio UI.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Agentic-RAG		Agentic-RAG
Graph-RAG		Graph-RAG
MultiModal-RAG		MultiModal-RAG
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
dspy_rag.ipynb		dspy_rag.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Suite: Multimodal, Agentic, and Graph-Based Retrieval-Augmented Generation

Overview

Current Progress

Multimodal RAG

Overview

Tech Stack

How It Works

Gallery

1. Gradio UI

2. Ingest PDF

3. Queries

4. Context and Images

5. Image Query

6. LangSmith Traces

About

Releases

Packages

Contributors 2

Languages

varundixit4/RAG-Techniques

Folders and files

Latest commit

History

Repository files navigation

RAG Suite: Multimodal, Agentic, and Graph-Based Retrieval-Augmented Generation

Overview

Current Progress

Multimodal RAG

Overview

Tech Stack

How It Works

Gallery

1. Gradio UI

2. Ingest PDF

3. Queries

4. Context and Images

5. Image Query

6. LangSmith Traces

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages