Skip to content

C++ implementation of LLMs with less than 3 billion params.

License

Notifications You must be signed in to change notification settings

iangitonga/nanochatllms.cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nanochatllms.cpp

nanochatllms.cpp is a repository containing pure C++ implementations of Chat-LLMs with less than 3 billion parameters. The goal is to provide implementation of quantised small Chat-LLMs that can run efficiently on lower-end devices. The models are implemented in fp16, 8-bit and 4-bit formats. This project was inspired by llama.cpp and llama.c

Implemented models

  1. MiniCPM-2B DPO source License
  2. StableLM-2-Zephyr-1.6B source
  3. TinyLlama-1.1B-Chat-v0.4 source
  4. OpenELM source

Build and run

git clone https://github.com/iangitonga/nanochatllms.cpp.git
cd nanochatllms.cpp/
cmake -S . -B build/ -DCMAKE_BUILD_TYPE=Release
cmake --build build/
build/bin/nanochat -m tinyllama -p "Give three tips on staying healthy."

To see all the available options, run

build/bin/nanochat --help

Sample Metrics

Note: Performance was recorded on a Intel-Xeon CPU @ 2.20GHz with two cores with AVX enabled.

Model Format Model size (GB) Performance (tokens/sec)
MiniCPM-2B FP16 5.5 2.4
Q8 2.9 3.2
Q4 1.5 3.4
Zephyr-1.6B FP16 3.3 4.2
Q8 1.8 5.2
Q4 0.9 5.6
TinyLlama-1.1B FP16 2.2 6.3
Q8 1.2 8.7
Q4 0.6 9.1

About

C++ implementation of LLMs with less than 3 billion params.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published