LlamaCpp Neuro-Symbolic Backend

Running a Llama server as a Neuro-Symbolic backend. This backend is a wrapper around the LlamaCpp project.

Installation

Install dependencies:

pip install -r requirements.txt

Initialize submodules:

git submodule update --init --recursive

Compile LlamaCpp for your System

cd llama.cpp
make

For more information, see the LlamaCpp.

[Optional] Prepare the Weights for LlamaCpp

[Optional] Get Weights from Meta

If you don't have the gguf weights, you can download them from Meta:

https://ai.meta.com/llama/

[Optional] Convert Weights to LlamaCpp Format

If you don't have the gguf weights (only the downloaded weights from Meta) then you will need to convert them to the LlamaCpp format. To do this, you will need to install the following dependencies:

pip install llama-recipes transformers datasets accelerate sentencepiece protobuf==3.20 py7zr scipy peft bitsandbytes fire torch_tb_profiler ipywidgets
# create a folder for all models
mkdir models

Now move all models (weights) obtained from Meta into the models directory and rename them to follow the following sub-folder naming convention: '7B', '7Bf', '13B', '13Bf', '30B', '34B', '65B', '70B', '70Bf'. Here B stands for Billion and f stands for float16. Then run the following commands:

# start ipython
ipython

Now we convert the weights to the HuggingFace format as follows:

# set the conversion script path for HuggingFace
TRANSFORM="""`python -c "import transformers;print('/'.join(transformers.__file__.split('/')[:-1])+'/models/llama/convert_llama_weights_to_hf.py')"`"""
# set the model path
model_dir = './models'
# set the model size and name
model_size = '13Bf'
# set the output path
hf_model_dir = './hf_models/llama-2-13b-chat'
# run the HuggingFace conversion
!python $TRANSFORM --input_dir $model_dir --model_size $model_size --output_dir $hf_model_dir

Now run the `gguf`` conversion script:

# navigate to the llama.cpp directory
# run the gguf conversion
python convert.py /path/to/hf_models/llama-2-13b-chat
# now you should have the gguf weights in the same directory
# you can now rename and move the `ggml-model-f16.gguf` weights to the models directory of your llama.cpp installation

[Optional] Quantize the Weights

If you want to quantize the weights to reduce the memory requirements, you will need to run the following command from the llama.cpp directory:

./quantize models/ggml-model-llama-2-13B-f16.gguf models/ggml-model-llama-2-13B-f16-q5_k_m.gguf Q5_K_M

In this example, we quantize the weights to 5 bits for the kernel and 5 bits for the mask (Q5_K_M) and save the quantized weights to the models/ggml-model-llama-2-13B-f16-q5_k_m.gguf file.

Run the Server

./server -m models/ggml-model-llama-2-13B-f16-q5_k_m.gguf -c 2048

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
llama.cpp @ edd1ab7		llama.cpp @ edd1ab7
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
config.yml		config.yml
package.json		package.json
requirements.txt		requirements.txt
run.sh		run.sh
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LlamaCpp Neuro-Symbolic Backend

Installation

Compile LlamaCpp for your System

[Optional] Prepare the Weights for LlamaCpp

[Optional] Get Weights from Meta

[Optional] Convert Weights to LlamaCpp Format

[Optional] Quantize the Weights

Run the Server

About

Releases

Packages

Languages

License

ExtensityAI/llamacpp

Folders and files

Latest commit

History

Repository files navigation

LlamaCpp Neuro-Symbolic Backend

Installation

Compile LlamaCpp for your System

[Optional] Prepare the Weights for LlamaCpp

[Optional] Get Weights from Meta

[Optional] Convert Weights to LlamaCpp Format

[Optional] Quantize the Weights

Run the Server

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages