Use llama.cpp with LangChain tools to answer a query. This script is based on the Custom Agent with Tool Retreival example from LangChain, with prompt engineering and output parsing modifications to better accomodate Llama models. Runs as a command line script.
- Clone this repository and navigate into its directory
- Install dependencies, i.e. using
pip install -r requirements.txt
- Download a CPU inference checkpoint compatible with llama.cpp
- Edit question.txt with your query
- Run the script using
python squire.py
, followed by command line options.
No command line options are required id the default model (wizardLM-7B.GGML.q4_2.bin), question file and template file are present in the script directory. Command line options for external files:
Option | Description | Default |
---|---|---|
-q --question | path to a *.txt file containing your question | question.txt |
-m --template | path to template *.txt file | template.txt |
-l --llama_path | path to ggml model weights *.bin file | ggml-model-q4_0.bin |
-o --output | path to output file for the final answer | out.txt |
-w --keyword-template | path to keyword extraction template *.txt file | keyword-template.txt |
There are also options to control what parameters the LangChain LlamaCpp()
forwards to llama.cpp:
Option | Default |
---|---|
-p --top_p | 0.95 |
-r --repeat_penalty | 1.1 |
-k --top_k | 30 |
-T --temperature | 0.2 |
-b --n_batch | 512 |
-g --n_gpu_layers | 0 |
-t -- n_threads | 6 |
Miscellaneous:
Option | Description |
---|---|
-v --verbose | verbose output (partially working) |
Squire will ingest default or provided parameters. Assuming this is successful, it will use LangChain to run the provided model using llama.cpp for encoding and inference. The model will choose a tool with which to search for an answer. The available tools have been selected because they do not require an API key and are free to use. Currently available tools:
- DuckDuckGo Search
- Arxiv
- Wikipedia
It may take the model several cycles of queries and attempted queries before it obtains satisfactory data to use in a final answer. Once it does obtain adequate data, Squire will write it to the output file and in command line. Squire has been tested with wizardLM-7B and llama-30b-supercot-ggml, giving answers to the best of its ability. Note that the quality of the output and efficiency of operation will be greatly affected by your choice of local language model.
I hope you find this tool both useful and simple to use. Please let me know if you encounter any issues or have any suggestions.