First create a raw knowledge base:
union run --remote union_rag/simple_rag.py get_documents \
--include_union \
--exclude_patterns '["/api/", "/_tags/"]'
Then synthesize a question and answer dataset:
union run --remote union_rag/synthesize_data.py data_synthesis_workflow \
--n_questions 1 \
--n_answers 5
Register the data annotation workflow:
union register union_rag/annotate_data.py
Run a single annotation session to test it out:
union run --remote union_rag/annotate_data.py create_annotation_set --random_seed 42 --n_samples 10
Create a secrets.txt
file to store these credentials. This file is ignored by
git and should look something like this:
UNIONAI_SERVERLESS_API_KEY=<UNIONAI_SERVERLESS_API_KEY>
Export the secrets to your environment:
export $(cat secrets.txt | xargs)
Run the app
streamlit run streamlit/annotation_app.py
Create the eval dataset:
union run --remote union_rag/eval_dataset.py create_eval_dataset --min_annotations_per_question 1
Evaluate a RAG experiment:
union run --remote union_rag/eval_rag.py evaluate_simple_rag --eval_configs config/eval_inputs_prompt.yaml
Experiment with different chunksizes:
union run --remote union_rag/eval_rag.py evaluate_simple_rag --eval_configs config/eval_inputs_chunksize.yaml
Experiment with different splitters:
union run --remote union_rag/eval_rag.py evaluate_simple_rag --eval_configs config/eval_inputs_splitter.yaml