Binding to llama.cpp #126

shadowmint · 2023-04-09T07:34:33Z

shadowmint
Apr 9, 2023

Forking off of #124 as a discussion instead.

tldr; Is it useful to have a binding of llama.cpp?

People seem to have mixed feelings.

My personal opinion is "be pragmatic". The llama.cpp library and ggml are both messy C++ libraries which are changing relatively quickly; given that ggml changes are solved upstream in llama.cpp very quickly, what is the benefit of binding to ggml directly? Is there a future in which it is replaced by a rust implementation for a 'full stack rust' solution?

I dunno. At any rate, right now there seem to be a few things which are lagging here, in terms of implementation.

Existing work:

example binding cxx: https://github.com/iacore/llama-sys

example binding cmake: https://github.com/shadowmint/llama-sys/

example higher level api: https://github.com/shadowmint/llama-rs/

example higher level api (? WIP?): https://github.com/iacore/llama-rs

^ These are all pretty trivial < 1 day worth of effort to setup; bringing some or all of it into this crate might help?

...but, I'm not going to die on a hill about it. If people prefer not to, there's no specific reason to bind it into this crate. I just thought it would be nice to have everything in one central place.

KerfuffleV2 · 2023-04-09T08:01:03Z

KerfuffleV2
Apr 9, 2023

Maybe I misunderstood what you were saying originally.

tldr; Is it useful to have a binding of llama.cpp?

I absolutely do think a binding should exist so people cane use it if they want to. More options are always good.

The thing I didn't agree was with the "struggling to keep up, always second best, will be used only reluctantly" part of your issue. I also wouldn't like to see llama-rs change direction to only become a thin shell around llama.cpp.

what is the benefit of binding to ggml directly?

Like you said, llama.cpp and GGML are messy C++ libraries. Rust users usually have a different philosophy of trying for correct behavior, safety, etc. The more of a project that lives in that messy C code, the less the Rust philosophy can be applied.

People who are interested in ML/LLMs can currently learn something about that by looking at the current code since it's setting up the structure of the model, defining the ops that occur, etc. Even though the heavy duty math is happening in GGML, there's still a lot that can be learned looking in the llama-rs code.

If it was just a front end for llama.cpp, then all you'd find is details about how to interface with that library. That's a whole lot less interesting.

Is there a future in which it is replaced by a rust implementation for a 'full stack rust' solution?

I certainly hope so! There are already a number of Rust projects in progress attempting to do something similar to GGML.

0 replies

setzer22 · 2023-04-09T09:01:26Z

setzer22
Apr 9, 2023
Maintainer

I share everything @KerfuffleV2 said. Even is llama.cpp is just little code compared to ggml, it's where all the interesting stuff happens.

Since the start (when this was a pretty literal port) this project has been gradually moving away from llama.cpp in several ways. I think most of the changes have been steps in the right direction. The llama.cpp ecosystem moves fast, but it's also sloppy. People are forking left and right instead of trying to evolve a cohesive library and follow good practices.

You also originally mentioned binding llama.cpp.directly would free development effort for other kinds of improvements. But may I ask, what improvements exactly? Anything that is a substantial improvement would require contributing to the C++ codebase directly so it would not be a Rust project anymore. Things like #14 would not be possible in this development model.

If this is just a novelty thing and nobody cares about llama a few months from now, then we all would've had fun. I don't mind who has more or less features. But if a few months from now it turns out this "inference at the edge using ggml" trend is still going, then people are going to want to build projects on top of it, and many are going to appreciate being able to do it in Rust, where building a robust application on top of llama will be just a cargo add away.

0 replies

hlhr202 · 2023-04-09T09:01:55Z

hlhr202
Apr 9, 2023

as a nodejs developer, i d like to see more options here. i m willing to provide both solution shipped like an adapters for llama rs and llama sys.

what i noticed is that an issue said we probably get a safetensor as alternative to ggml. would that be happening in the future?

0 replies

hlhr202 · 2023-04-09T09:04:39Z

hlhr202
Apr 9, 2023

i think the problem is that we need to grab more attentions from the open source community. now people all know llama.cpp, but only part of them know llama rs.

0 replies

KerfuffleV2 · 2023-04-09T09:42:57Z

KerfuffleV2
Apr 9, 2023

what i noticed is that an issue said we probably get a safetensor as alternative to ggml.

SafeTensors is just a file format for storing tensors. Loading tensors or settings like hyperparameters, vocab, whatever is generally not very difficult. So you could pretty easily convert existing GGML models to SafeTensors format and possibly vice versa.

The complicated thing would be the format the actual tensors are actually stored in. For example, GGML has its own implementation of quantization so anything working with those tensors would have to be able to deal with that format.

So just to be clear in the summary: SafeTensors isn't anything more than a storage format for tensors, it doesn't have anything to do with actually running the model. GGML is both a storage format and a library for actually using them.

now people all know llama.cpp, but only part of them know llama rs.

There are also just generally less Rust developers. A big factor in who can participate will be whether they know the language the respective application is written in. C++ developers aren't really going to be contributing to Rust projects much, and vice versa.

0 replies

shadowmint · 2023-04-09T09:58:17Z

shadowmint
Apr 9, 2023
Author

I think my feeling is best understood by a comparison to opencv.

This is extremely excellent project: https://github.com/twistedfall/opencv-rust

I use it a lot.

It's not a pure rust project, and the various attempts at a pure rust computer vision stack, while interesting, are, basically immature.

Re-implementing all of that in rust instead of building applications using it is something I (and, it seems, most other people) are not really interested in, because the marginal benefit of the 'pure rust' stack, is out-weighed significantly by the effort required to build an equivalent library.

I hear the other arguments in this thread, and totally fair points. I'm just saying; that's my take. Solving the 'can do inference' problem and then moving on to 'and then do interesting things with it as a dependency' is more interesting to me personally.

1 reply

philpax Apr 10, 2023
Maintainer

the marginal benefit of the 'pure rust' stack, is out-weighed significantly by the effort required to build an equivalent library.

I think you overestimate the effort required for us specifically; llama-rs already exists and works, and tracking the changes from llama.cpp isn't that difficult (when the changes are documented...)

Solving the 'can do inference' problem and then moving on to 'and then do interesting things with it as a dependency' is more interesting to me personally.

I agree! That's why I was able to start using llama.rs as a dependency before llama.cpp, because librarification was much simpler 😆

In all seriousness, I'm happy with where we sit right now - I think we gain much more than we lose.

katopz · 2023-04-09T12:02:35Z

katopz
Apr 9, 2023

FYI: I just saw llm-chain use llama.cpp approach (w/ a lot of binding) 👉 https://github.com/sobelio/llm-chain/tree/main/llm-chain-llama

I use llama.cpp approach atm.
I still prefer pure Rust in the long term.
I don't know what I'm doing, mostly just have some fun with 🦀 gang.

0 replies

KerfuffleV2 · 2023-04-09T12:52:23Z

KerfuffleV2
Apr 9, 2023

Solving the 'can do inference' problem and then moving on to 'and then do interesting things with it as a dependency' is more interesting to me personally.

That's 100% valid. Different people have different goals and preferences.

For me, interacting with this project is mostly to learn about how the LLM works internally. Also, I don't really like the idea of being limited to what the interface supports. With llama-rs, I actually have the capability to dig into the code and add functionality that didn't already exists. When interfacing with a wrapper to llama.cpp itself, that ability is going to be much more limited (I have no interest in learning/writing C++).

Of course, I also want to enable interesting things just with the results as well but that might not be possible if I run into a limitation that's out of my hands to deal with.

0 replies

philpax · 2023-04-10T14:28:58Z

philpax
Apr 10, 2023
Maintainer

Apologies for the late reply to this - I've been enjoying my long weekend 😅

I wrote up the project's thoughts on this in the README, which I'll quote here:

This is a reimplementation of llama.cpp that does not share any code with it outside of ggml. This was done for a variety of reasons:

llama.cpp requires a C++ compiler, which can cause problems for cross-compilation to more esoteric platforms. An example of such a platform is WebAssembly, which can require a non-standard compiler SDK.

Rust is easier to work with from a development and open-source perspective; it offers better tooling for writing "code in the large" with many other authors. Additionally, we can benefit from the larger Rust ecosystem with ease.

We would like to make ggml an optional backend (see this issue).

In general, we hope to build a solution for model inferencing that is as easy to use and deploy as any other Rust crate.

but speaking more personally:

The upstream llama.cpp repository has a lot of churn, and it's more chaotic than I would like (i.e. the way GGJT was handled)
Avoiding a C++ compiler is very nice - it means we can avoid broken build on fedora #22 and make it easy to deploy, especially for Docker builds
We can offer a more robust and less segfaulty implementation through Rust
Having direct access to the code allows us to provide specific Rust features and guarantees, and use the ecosystem for implementation (Investigate concurrent inference across threads with one model #95, Parallel loading of the model tensors #79, Use the HuggingFace llama Tokenizer #35, etc)
In the future, we'd like to decouple from GGML entirely and make it an optional backend

I'm not opposed to binding llama.cpp, but I do think we have different goals and motivations here. We want to build a solid piece of reliable inferencing infrastructure, not just bind a moving target.

If we were to bind it, I suspect the interface would be different enough (especially around InferenceSession) that it'd require its own crate anyway.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binding to llama.cpp #126

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Binding to llama.cpp #126

shadowmint Apr 9, 2023

Existing work:

Replies: 9 comments · 1 reply

KerfuffleV2 Apr 9, 2023

setzer22 Apr 9, 2023 Maintainer

hlhr202 Apr 9, 2023

hlhr202 Apr 9, 2023

KerfuffleV2 Apr 9, 2023

shadowmint Apr 9, 2023 Author

philpax Apr 10, 2023 Maintainer

katopz Apr 9, 2023

KerfuffleV2 Apr 9, 2023

philpax Apr 10, 2023 Maintainer

shadowmint
Apr 9, 2023

Replies: 9 comments 1 reply

KerfuffleV2
Apr 9, 2023

setzer22
Apr 9, 2023
Maintainer

hlhr202
Apr 9, 2023

hlhr202
Apr 9, 2023

KerfuffleV2
Apr 9, 2023

shadowmint
Apr 9, 2023
Author

philpax Apr 10, 2023
Maintainer

katopz
Apr 9, 2023

KerfuffleV2
Apr 9, 2023

philpax
Apr 10, 2023
Maintainer