Skip to content

tangledgroup/llama-cpp-cffi

Repository files navigation

llama-cpp-cffi

PyPI Supported Versions PyPI Downloads Github Downloads License: MIT

Python binding for llama.cpp using cffi. Supports CPU, Vulkan 1.x and CUDA 12.6 runtimes, x86_64 and aarch64 platforms.

NOTE: Currently supported operating system is Linux (manylinux_2_28 and musllinux_1_2), but we are working on both Windows and MacOS versions.

News

  • Dec 9 2024, v0.2.0: Support for low-level and high-level APIs: llama, llava, clip and ggml API
  • Nov 27 2024, v0.1.22: Support for Multimodal models such as llava and minicpmv.

Install

Basic library install:

pip install llama-cpp-cffi

IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .

GPU Compute Capability: compute_61, compute_70, compute_75, compute_80, compute_86, compute_89 covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.

LLM Example

from llama import Model


#
# first define and load/init model
#
model = Model(
    creator_hf_repo='HuggingFaceTB/SmolLM2-1.7B-Instruct',
    hf_repo='bartowski/SmolLM2-1.7B-Instruct-GGUF',
    hf_file='SmolLM2-1.7B-Instruct-Q4_K_M.gguf',
)

model.init(ctx_size=8192, predict=1024, gpu_layers=99)

#
# messages
#
messages = [
    {'role': 'system', 'content': 'You are a helpful assistant.'},
    {'role': 'user', 'content': '1 + 1 = ?'},
    {'role': 'assistant', 'content': '2'},
    {'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]

for chunk in model.completions(messages=messages, temp=0.7, top_p=0.8, top_k=100):
    print(chunk, flush=True, end='')

#
# prompt
#
for chunk in model.completions(prompt='Evaluate 1 + 2 in Python. Result in Python is', temp=0.7, top_p=0.8, top_k=100):
    print(chunk, flush=True, end='')

VLM Example

from llama import Model


#
# first define and load/init model
#
model = Model( # 1.87B
    creator_hf_repo='vikhyatk/moondream2',
    hf_repo='vikhyatk/moondream2',
    hf_file='moondream2-text-model-f16.gguf',
    mmproj_hf_file='moondream2-mmproj-f16.gguf',
)

model.init(ctx_size=8192, predict=1024, gpu_layers=99)

#
# prompt
#
for chunk in model.completions(prompt='Describe this image.', image='examples/llama-1.png'):
    print(chunk, flush=True, end='')

References

  • examples/llm.py
  • examples/vlm.py

About

llama.cpp cffi python binding

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages