Skip to content

Latest commit

 

History

History
99 lines (67 loc) · 2.83 KB

README.md

File metadata and controls

99 lines (67 loc) · 2.83 KB

RAG DRIAS

Our goal is to make a Retrieval Augmented Generation (RAG) on the DRIAS portal.

LLMs used in specialized fields may create hallucinations due to their lack of knowledge. RAG helps solve this problem by retrieving relevant documents from external knowledge bases.

homepage

Repository Structure

rag_drias
└─── docs
└─── rag_drias
│   └─── data.py          # text data management
│   └─── embedding.py     # wrapper for embedding models
│   └─── crawl.py         # website crawling tools
│   └─── settings.py      # settings (paths, model names,...)
└─── main.py              # Main python script

Documentation

Full code documentation of Rag_drias can be found here.

Install

  1. git clone https://github.com/meteofrance/rag_drias.git

  2. Build conda environment:

    conda env create --file environment.yaml
    conda activate ragdrias
  1. Change BASE_PATH in rag_drias/settings.py. This is where all your data and models will be saved.

  2. Download manually the different models :

If needed, see install instructions for git-lfs.

If needed, setup your HugginFace access token. (needed for Llama3B).

    cd <BASE_PATH>
    git lfs install   # (should return `Git LFS initialized.`)
    git clone https://huggingface.co/dangvantuan/sentence-camembert-large
    git clone https://huggingface.co/jpacifico/Chocolatine-14B-Instruct-4k-DPO
    git clone https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct  # optionnal
    git clone https://huggingface.co/BAAI/bge-reranker-v2-m3 # optionnal

Usage

  1. Crawl the website:
python main.py crawl
  1. Prepare the vector database:
python main.py prepare-database
  1. Make a query and retrieve the most relevant samples:
python main.py query "Quels formats de données sont disponibles pour le téléchargement sur DRIAS ?"
  1. Make a query and retrieve the answer:
python main.py answer "Quels formats de données sont disponibles pour le téléchargement sur DRIAS ?"

add reranker model :

python main.py answer "Quels formats de données sont disponibles pour le téléchargement sur DRIAS ?" --reranker bge-reranker-v2-m3
  1. To see what the LLM would answer without the retrieved chunks:
python main.py answer "Quels formats de données sont disponibles pour le téléchargement sur DRIAS ?" --no-use-rag"

Use --help to see all available options in the main.py script.