Golden-Retriever

A Langchain-based RAG application using Angle Embeddings and Cohere Reranker to fetch relevant emails based on query.

Usage

Docker Compose

docker compose up

The app is served on localhost:5173/

Pipeline

Gmail API

The Gmail API is used to download emails into a specified mongo database.

Once set up, the credential files are to be placed in a folder titled res in backend

Customization

The default behavior is to collect all emails in the Primary folder in the last 60 days. This can be customized by modifying the get_emails script lines 49-52.

Retriever

The custom Retriever is composed of a AnglE embedding model and a Pinecone Vector Database. Pinecone offers a free API Index, which was used to build this project.

Re-Ranker

The Cohere Re-ranker was used to re-rank 20 retrievals into 10 most relevant documents, which were stored in the Mongo database, and displayed in the web application.

Containers

Front-End

The container serves a React App built using ViteJS containing custom CSS elements for aesthetic purposes. The app makes fetch requests based on context and serves the result in a read-only Textbox. The API is served on LocalHost:5173/.

Back-End

The Flask API serves two distinct endpoints:

/get_emails Used to invoke the Gmail API to collect emails from inbox.
/rag Used to perform the Retrieval Pipeline.

Considerations

The docker containers need increased memory to perform the retrieval tasks, failing which the backend container may crash.

Mongo

The mongo v7.0 database interacts solely with the backend container, for storing all emails in a collection titled emails and retrieved emails in a collection titled event-emails.

Conclusions

Golden Retriever was a personal project I undertook after missing an event at Northeastern on a Wednesday in February, 2024. The project involves designing and developing a RAG pipeline based on personal research, containerized into three containers bridged by an underlying docker network and orchestrated using docker-compose. The query-based email retrieval step takes a bit of time at the moment. There are some security considerations such as passing of API Keys via REST API Calls, which are subject to attacks, but in this context, which is completely local, has been deemed to be sufficient. A LLM can be retrofitted using an additional line of python code, using Langchain ConversationalRetrievalChain but has not been implemented as a part of the project due to compute restrictions of running it locally. LLMs such as Llama 2 7B has been tested and works as expected.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compose.yaml		compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Golden-Retriever

Usage

Docker Compose

Pipeline

Gmail API

Customization

Retriever

Re-Ranker

Containers

Front-End

Back-End

Considerations

Mongo

Conclusions

About

Releases

Packages

Languages

License

suprateembanerjee/Golden-Retriever

Folders and files

Latest commit

History

Repository files navigation

Golden-Retriever

Usage

Docker Compose

Pipeline

Gmail API

Customization

Retriever

Re-Ranker

Containers

Front-End

Back-End

Considerations

Mongo

Conclusions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages