Tensor library for machine learning
Note that this project is under active development.
Some of the development is currently happening in the llama.cpp and whisper.cpp repos
- Written in C
- 16-bit float support
- Integer quantization support (4-bit, 5-bit, 8-bit, etc.)
- Automatic differentiation
- ADAM and L-BFGS optimizers
- Optimized for Apple Silicon
- On x86 architectures utilizes AVX / AVX2 intrinsics
- On ppc64 architectures utilizes VSX intrinsics
- No third-party dependencies
- Zero memory allocations during runtime
- Example of GPT-2 inference examples/gpt-2
- Example of GPT-J inference examples/gpt-j
- Example of Whisper inference examples/whisper
- Example of LLaMA inference ggerganov/llama.cpp
- Example of LLaMA training ggerganov/llama.cpp/examples/baby-llama
- Example of Falcon inference cmp-nct/ggllm.cpp
- Example of BLOOM inference NouamaneTazi/bloomz.cpp
- Example of RWKV inference saharNooby/rwkv.cpp
- Example of SAM inference examples/sam
- Example of BERT inference skeskinen/bert.cpp
- Example of BioGPT inference PABannier/biogpt.cpp
- Example of Encodec inference PABannier/encodec.cpp
- Example of CLIP inference monatis/clip.cpp
- Example of MiniGPT4 inference Maknee/minigpt4.cpp
- Example of ChatGLM inference li-plus/chatglm.cpp
- Example of Stable Diffusion inference leejet/stable-diffusion.cpp
- Example of Qwen inference QwenLM/qwen.cpp
- Example of YOLO inference examples/yolo
- Example of ViT inference staghado/vit.cpp
- Example of multiple LLMs inference foldl/chatllm.cpp
- SeamlessM4T inference (in development) https://github.com/facebookresearch/seamless_communication/tree/main/ggml
With ggml you can efficiently run Whisper inference on the CPU.
Memory requirements:
Model | Disk | Mem |
---|---|---|
tiny | 75 MB | ~280 MB |
base | 142 MB | ~430 MB |
small | 466 MB | ~1.0 GB |
medium | 1.5 GB | ~2.6 GB |
large | 2.9 GB | ~4.7 GB |
With ggml you can efficiently run GPT-2 and GPT-J inference on the CPU.
Here is how to run the example programs: