llama: update llama.cpp to latest version #244

danbev · 2023-11-27T15:58:49Z

This commit updates llama.cpp to the latest/later version.

The motivation for this is that the current version of llama.cpp is a little outdated and there have been changes to the llama.cpp API and also the model format. Currently it is not possible to use the new GGUF format and many of the available models are in this new format which can make it challenging to use this crate at the moment.

The following changes have been made:

update llama.cpp to latest version using git submodule update --remote --merge llama.cpp
Manually copied the generated bindings.rs file from the target directory to the src directory. Hope this was the correct thing to do.
Updated the llm-chain-llama crate to use llama_decode instead of llm_eval which has now been deprecated.
~~A number of TODOs have been added to the code to highlight areas that I know I need to look at.~~

This is a work in progress but I wanted to open a draft pull request sooner rather than later to get some visibility and feedback.

Currently I've been able to successfully run the simple example, few_shot, and stream examples. ~~The map_reduce_llama example is not working as of this writing which I'll look into further~~.

williamhogman · 2023-11-27T23:12:01Z

<3

Juzov · 2023-11-28T22:43:35Z

There's a clause ignoring MaxTokens if MaxTokens == 0 (or rather the reverse). So adding MaxTokens to be equal to MaxContextSize in the examples is redundant. If you want, and its possible, you can try to alter the option to be as big as the context window by default and remove the clause.

danbev · 2023-11-29T07:33:21Z

So adding MaxTokens to be equal to MaxContextSize in the examples is redundant.

I added a MaxBatchSize option in 452ac2c, and it has default value of 512 which matches the value in llama.cpp, and have now removed the options from the examples (apart from simple_llama).

If you want, and its possible, you can try to alter the option to be as big as the context window by default and remove the clause.

I'm planning on taking a closer look at the model options today, and I'll also take another look at the context options and your suggestion. Thanks

This commit updates llama.cpp to the latest/later version. The motivation for this is that the current version of llama.cpp is a little outdated and there have been changes to the llama.cpp API and also the model format. Currently it is not possible to use the new GGUF format and many of the available models are in this new format which can make it challenging to use this crate at the moment. The following changes have been made: * update llama.cpp to latest version using git submodule update --remote --merge llama.cpp * Manually copied the generated bindings.rs file from the target directory to the src directory. Hope this was the correct thing to do. * Updated the llm-chain-llama crate to use llama_decode instead of llm_eval which has now been deprecated. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

This is an attempt to fix builds from returning: ``` Error: The operation was canceled. ``` Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

This is an attempt to prevent the currently failing windows build from causing the other builds to be cancelled (at least that is what I think is happening). Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

This commit is an attempt to update the hnsw dependency to version 0.2 and to fix the currently failing windows build of hswn_rs version 0.1.19 which is current failing to compile on windows: ```console error[E0308]: mismatched types --> C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\hnsw_rs-0.1.19\src\libext.rs:439:39 | 439 | let c_dist = DistCFFI::<f32>::new(c_func); | -------------------- ^^^^^^ expected `u32`, found `u64` | | | arguments to this function are incorrect | = note: expected fn pointer `extern "C" fn(_, _, u32) -> _` found fn pointer `extern "C" fn(_, _, u64) -> _` note: associated function defined here --> C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\hnsw_rs-0.1.19\src\dist.rs:990:12 | 990 | pub fn new(f:DistCFnPtr<T>) -> Self { | ^^^ --------------- ``` I was able to reproduce this issue locally by cross compiling (which produces the above error). But cross compiling with version 0.2 work and so I've attempted to upgrade to that version. This is very much a suggestion as I'm not familiar with the hnsw code but perhaps it will be useful to someone else and save some time investigating the issue. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

This commit changes the build.rs script to write the generated bindings to the src directory to avoid manual copying of the bindings.rs file. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

Juzov

Looks good, added some comments. After they have been explained, we can approve this. Thanks for your effort

crates/llm-chain-llama/src/options.rs

.github/workflows/cicd.yaml

crates/llm-chain-llama/src/executor.rs

crates/llm-chain-llama/src/batch.rs

crates/llm-chain-llama/src/executor.rs

This commit revert the change to the StopSequence option in llm-chain-llama/src/options.rs. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

This commit removes the `From<llama_batch> for LlamaBatch` impl as it is no longer needed. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

This commit creates a new LlamaBatch for new token sampled instead of reusing the same one. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

This commit extracts the logic for checking if the prompt is a question into a separate conditional check. I've tried to clarify the comment of this check as well so it is hopefully easier to understand now. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

danbev force-pushed the update-llama.cpp branch from ec8affa to f66e9a3 Compare November 30, 2023 09:04

danbev added 7 commits December 5, 2023 08:18

build: increase timeout to 30 mins for ci jobs

feaacc3

This is an attempt to fix builds from returning: ``` Error: The operation was canceled. ``` Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

build: add fail-fast: false to build strategy

6e85f7b

This is an attempt to prevent the currently failing windows build from causing the other builds to be cancelled (at least that is what I think is happening). Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

build: write bindings to src directory

192ac61

This commit changes the build.rs script to write the generated bindings to the src directory to avoid manual copying of the bindings.rs file. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

doc: add instructions for updating llama.cpp

922c1b9

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

src: update llama.cpp submodule

b667780

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

danbev force-pushed the update-llama.cpp branch from 4164250 to b667780 Compare December 5, 2023 08:02

danbev changed the title ~~llama: update llama.cpp to latest version (wip)~~ llama: update llama.cpp to latest version Dec 5, 2023

danbev marked this pull request as ready for review December 5, 2023 08:17

danbev mentioned this pull request Dec 6, 2023

llama: add Embeddings for llama #245

Merged

Juzov reviewed Dec 12, 2023

View reviewed changes

danbev added 4 commits December 13, 2023 09:30

squash! llama: update llama.cpp to latest version

6751926

This commit revert the change to the StopSequence option in llm-chain-llama/src/options.rs. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

squash! llama: update llama.cpp to latest version

49c7539

This commit removes the `From<llama_batch> for LlamaBatch` impl as it is no longer needed. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

squash! llama: update llama.cpp to latest version

8652091

This commit creates a new LlamaBatch for new token sampled instead of reusing the same one. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

Juzov approved these changes Dec 14, 2023

View reviewed changes

Juzov merged commit cad5646 into sobelio:main Dec 15, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama: update llama.cpp to latest version #244

llama: update llama.cpp to latest version #244

danbev commented Nov 27, 2023 •

edited

Loading

williamhogman commented Nov 27, 2023

Juzov commented Nov 28, 2023

danbev commented Nov 29, 2023

Juzov left a comment

llama: update llama.cpp to latest version #244

llama: update llama.cpp to latest version #244

Conversation

danbev commented Nov 27, 2023 • edited Loading

williamhogman commented Nov 27, 2023

Juzov commented Nov 28, 2023

danbev commented Nov 29, 2023

Juzov left a comment

Choose a reason for hiding this comment

danbev commented Nov 27, 2023 •

edited

Loading