nanochatllms.cpp

nanochatllms.cpp is a repository containing pure C++ implementations of Chat-LLMs with less than 3 billion parameters. The goal is to provide implementation of quantised small Chat-LLMs that can run efficiently on lower-end devices. The models are implemented in fp16, 8-bit and 4-bit formats. This project was inspired by llama.cpp and llama.c

Implemented models

MiniCPM-2B DPO source License
StableLM-2-Zephyr-1.6B source
TinyLlama-1.1B-Chat-v0.4 source
OpenELM source

Build and run

git clone https://github.com/iangitonga/nanochatllms.cpp.git
cd nanochatllms.cpp/
cmake -S . -B build/ -DCMAKE_BUILD_TYPE=Release
cmake --build build/
build/bin/nanochat -m tinyllama -p "Give three tips on staying healthy."

To see all the available options, run

build/bin/nanochat --help

Sample Metrics

Note: Performance was recorded on a Intel-Xeon CPU @ 2.20GHz with two cores with AVX enabled.

Model	Format	Model size (GB)	Performance (tokens/sec)
MiniCPM-2B	FP16	5.5	2.4
	Q8	2.9	3.2
	Q4	1.5	3.4
Zephyr-1.6B	FP16	3.3	4.2
	Q8	1.8	5.2
	Q4	0.9	5.6
TinyLlama-1.1B	FP16	2.2	6.3
	Q8	1.2	8.7
	Q4	0.6	9.1

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.vscode		.vscode
assets/tokenizers		assets/tokenizers
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
model_dl.py		model_dl.py
notes		notes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nanochatllms.cpp

Implemented models

Build and run

Sample Metrics

About

Releases

Packages

Languages

License

iangitonga/nanochatllms.cpp

Folders and files

Latest commit

History

Repository files navigation

nanochatllms.cpp

Implemented models

Build and run

Sample Metrics

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages