Skip to content

MachineLearningSystem/glinthawk

 
 

Repository files navigation

Glinthawk

An inference engine for the Llama2 model family, written in C++.

Development

Dependencies

For building the CUDA version, you will also need:

Make sure your nvcc is compatible with your GCC version.

Building

mkdir build
cd build
cmake ..
make -j`nproc`

Please adjust according to your setup as needed. Tested only on Ubuntu 22.04 and later. This program requires C++20 support, and makes use of some Linux-specific system calls (like memfd_create), although in a limited way.

For more information on building this project, please take a look at the Dockerfile.amd64 and Dockerfile.cuda files.

Test Models

For testing purposes, you can use tinyllamas. Please use the tools/bin2glint.py script to convert .bin files to Glinthawk's format. There are other scripts in the tools directory for converting the original Llama2 models.

Trademark Notice

Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

About

An LLM inference engine, written in C++

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 79.6%
  • Python 13.7%
  • Cuda 5.3%
  • CMake 0.7%
  • Shell 0.4%
  • Cython 0.3%