Glinthawk

An inference engine for the Llama2 model family, written in C++.

Development

Dependencies

CMake >=3.18
GCC >=12 (C++20 support required)
OpenSSL
Protobuf
Google Logging Library

For building the CUDA version, you will also need:

CUDA Toolkit

Make sure your nvcc is compatible with your GCC version.

Building

mkdir build
cd build
cmake ..
make -j`nproc`

Please adjust according to your setup as needed. Tested only on Ubuntu 22.04 and later. This program requires C++20 support, and makes use of some Linux-specific system calls (like memfd_create), although in a limited way.

For more information on building this project, please take a look at the Dockerfile.amd64 and Dockerfile.cuda files.

Test Models

For testing purposes, you can use tinyllamas. Please use the tools/bin2glint.py script to convert .bin files to Glinthawk's format. There are other scripts in the tools directory for converting the original Llama2 models.

Trademark Notice

Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Glinthawk

Development

Dependencies

Building

Test Models

Trademark Notice

Files

README.md

Latest commit

History

README.md

File metadata and controls

Glinthawk

Development

Dependencies

Building

Test Models

Trademark Notice