Curated list of open source tools and projects to help with retrieval augmented generation
RAG stands for Retrieval Augmented Generation, a technique where the capabilities of a large language model (LLM) are augmented by retrieving information from other computer systems and providing them as context for the LLM through the prompt. This gives LLMs information beyond what was provided in their training data, which is critical for LLM applications to provide personalized responses. Example use cases include scraping data from current web pages, parsing data from PDFs and documents, and answering questions about data from Confluence, Salesforce or other SaaS apps.
RAG works better than fine-tuning models because it’s cheaper, it’s faster, and it’s more reliable since metadata about the sources of information is attached to each response.
We also have an open source project that makes setting up RAG on your own infrastructure super easy. Check it out here
Contributions welcome. Add links through pull requests or create an issue to start a discussion. Please read the contribution guidelines before contributing.
- 🧺 Awesome RAG
- Table of Contents
- [Data Connectors] (#connectors)
- Storage
- Retrieval
- LLMs
- Deployment
- Articles
Tools for connecting to data sources
- Psychic: Data integrations platform for LLMs with turnkey auth, syncs and an universal API.
Tools and databases to store knowledge for retrieval.
- Chroma: The AI-native open-source embedding database.
- Qdrant: Vector Database for the next generation of AI applications. Also available in the cloud.
- Weaviate: Weaviate is an open source vector database that stores both objects and vectors, allowing for combining vector search with structured filtering with the fault-tolerance and scalability of a cloud-native database, all accessible through GraphQL, REST, and various language clients.
- MongoDB: The MongoDB Database.
- PostgreSQL: Mirror of the official PostgreSQL GIT repository. Note that this is just a mirror - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch
- Supabase: The open source Firebase alternative. Follow to stay updated about our public Beta.
- Neo4j: Graphs for everyone
- LlamaIndex: LlamaIndex (GPT Index) is a data framework for your LLM applications
- Llama 2: The next generation of Meta's open source large language model.
- LlamaIndex: An ecosystem of open-source chatbots trained on a massive collections of clean assistant data including code, stories and dialogue
- RAGStack: Chat your data privately with a self-hosted retrieval augmented generation (RAG) stack built on top of open-source LLMs like Falcon, Llama and GPT4All