.NET library to run LLaMA using ggerganov/llama.cpp.
To build the library, you need to have CMake and Python installed. Then, run the following commands at the root of the repository.
# Pull the submodules
git submodule update --init --recursive
# Build and prepare the C++ library
python scripts/build_llama_cpp.py
Then, build the .NET library using dotnet
:
# Build the .NET library
dotnet build LLaMA.NET/LLaMA.NET.csproj
The built library should be located at LLaMA.NET/bin/Debug/netXXXX/LLaMA.NET.dll
.
Currently only Linux is supported. Work is being done to dynamically load the C++ library on other platforms.
To use the library, you need to have a model. It needs to be converted to a binary format that can be loaded by the library. See llama.cpp/README.md for more information on how to convert a model.
The model directory should contain the following files:
ggml-model-q4_0.bin
: The model file.params.json
: The model parameters.tokenizer.model
: The tokenizer model.
To run inference, you need to load a model and create a runner. The runner can then be used to run inference on a prompt.
using LLaMA.NET;
LLaMAModel model = LLaMAModel.FromPath("/path/to/your/ggml-model-q4_0.bin");
LLaMARunner runner = model.CreateRunner()
.WithThreads(8);
var res = runner.WithPrompt(" This is the story of a man named ")
.Infer(out _, nTokensToPredict = 50);
Console.Write(res);
model.Dispose();
This project is licensed under the MIT License - see the LICENSE file for details.
- ggerganov/llama.cpp for the LLaMA implementation in C++.
- sandrohanea/whisper.net as the reference on loading ggml models and libraries into .NET.