Skip to content
/ ggml Public
forked from ggerganov/ggml

Adding Mamba metadata to the GGUF spec

License

Notifications You must be signed in to change notification settings

compilade/ggml

 
 

Repository files navigation

ggml

Roadmap / Manifesto

Tensor library for machine learning

Note that this project is under active development.
Some of the development is currently happening in the llama.cpp and whisper.cpp repos

Features

  • Written in C
  • 16-bit float support
  • Integer quantization support (4-bit, 5-bit, 8-bit, etc.)
  • Automatic differentiation
  • ADAM and L-BFGS optimizers
  • Optimized for Apple Silicon
  • On x86 architectures utilizes AVX / AVX2 intrinsics
  • On ppc64 architectures utilizes VSX intrinsics
  • No third-party dependencies
  • Zero memory allocations during runtime

Updates

Whisper inference (example)

With ggml you can efficiently run Whisper inference on the CPU.

Memory requirements:

Model Disk Mem
tiny 75 MB ~280 MB
base 142 MB ~430 MB
small 466 MB ~1.0 GB
medium 1.5 GB ~2.6 GB
large 2.9 GB ~4.7 GB

GPT inference (example)

With ggml you can efficiently run GPT-2 and GPT-J inference on the CPU.

Here is how to run the example programs: