GitHub - bcrodrigo/rag_with_ollama: Repository to test RAG using Ollama

Description

Repository to test Retrieval Augmented Generation (RAG) on-device using Ollama.

Background and Motivation

Retrieval Augmented Generation (RAG) is a process to optimize the output of a Large Language Model (LLM) so that it references sources outside of its training data before generating a response ¹. This is a less computationally expensive process, compared to fine-tuning, where selected weights of the model are actually updated in order to account for new items.

In this small project I want to setup my computer to interact with LLMs locally, on device, without using an API call. This will enable a workflow that I can use to read and query documents (mainly pdfs) without privacy concerns or limits to the number of tokens to be used by the LLM.

Additionally, these scripts will provide a building block for Natural Language Processing tasks, as there is a necessity to convert text into embeddings, and LLMs already do this by default.

Installation Procedure

Install Ollama and review main commands ².

ollama pull <model_name>
ollama rm <model_name>
ollama list
ollama serve

Clone this repository.
Install the requirements with

pip install -r requirements.txt

Make two new directories so that the structure is as follows:

.
├── data/  --> contains your pdf files
├── database/  --> initally empty will contain the vector database
├── src/ --> contains all python scripts

Usage

Preparation

Review the available models at the Ollama model library. Note that it's recommended that:

"You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models."
Select an LLM, and download it with

ollama pull <model_name>

Note that I've selected llama3 for all the scripts.

Start a local Ollama server

ollama serve

Adding Documents to the Database

Move your pdf documents to the data/ folder then run

python create_or_add_to_database.py

Querying the Database

python query_database.py "your_query_for_LLM"

Challenges and Recommendations

Choice of database: I use ChromaDB, but there are other options, that could provide higher speed.

Choice of embeddings: I use OllamaEmbeddings() with the chosen model, but there are other options.

Consistency and repeatability of responses: This is related to the embeddings, chunk sizes, and the metadata available in the database.

Need to look into hierarchies and knowledge graphs (see ³ for details)

Evaluation of results: This will be related to the contents of the documents we add to the database.

Need to have a dedicated test suit to try perform consistent evaluation of the LLM.

Deployment: This necessitates an API key, and it's outside of the scope of this project.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Background and Motivation

Installation Procedure

Usage

Preparation

Adding Documents to the Database

Querying the Database

Challenges and Recommendations

References

About

Releases

Packages

Languages

License

bcrodrigo/rag_with_ollama

Folders and files

Latest commit

History

Repository files navigation

Description

Background and Motivation

Installation Procedure

Usage

Preparation

Adding Documents to the Database

Querying the Database

Challenges and Recommendations

References

Footnotes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages