-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama: update llama.cpp to latest version #244
Conversation
<3 |
There's a clause ignoring MaxTokens if MaxTokens == 0 (or rather the reverse). So adding MaxTokens to be equal to MaxContextSize in the examples is redundant. If you want, and its possible, you can try to alter the option to be as big as the context window by default and remove the clause. |
I added a MaxBatchSize option in 452ac2c, and it has default value of 512 which matches the value in llama.cpp, and have now removed the options from the examples (apart from simple_llama).
I'm planning on taking a closer look at the model options today, and I'll also take another look at the context options and your suggestion. Thanks |
ec8affa
to
f66e9a3
Compare
This commit updates llama.cpp to the latest/later version. The motivation for this is that the current version of llama.cpp is a little outdated and there have been changes to the llama.cpp API and also the model format. Currently it is not possible to use the new GGUF format and many of the available models are in this new format which can make it challenging to use this crate at the moment. The following changes have been made: * update llama.cpp to latest version using git submodule update --remote --merge llama.cpp * Manually copied the generated bindings.rs file from the target directory to the src directory. Hope this was the correct thing to do. * Updated the llm-chain-llama crate to use llama_decode instead of llm_eval which has now been deprecated. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
This is an attempt to fix builds from returning: ``` Error: The operation was canceled. ``` Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
This is an attempt to prevent the currently failing windows build from causing the other builds to be cancelled (at least that is what I think is happening). Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
This commit is an attempt to update the hnsw dependency to version 0.2 and to fix the currently failing windows build of hswn_rs version 0.1.19 which is current failing to compile on windows: ```console error[E0308]: mismatched types --> C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\hnsw_rs-0.1.19\src\libext.rs:439:39 | 439 | let c_dist = DistCFFI::<f32>::new(c_func); | -------------------- ^^^^^^ expected `u32`, found `u64` | | | arguments to this function are incorrect | = note: expected fn pointer `extern "C" fn(_, _, u32) -> _` found fn pointer `extern "C" fn(_, _, u64) -> _` note: associated function defined here --> C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\hnsw_rs-0.1.19\src\dist.rs:990:12 | 990 | pub fn new(f:DistCFnPtr<T>) -> Self { | ^^^ --------------- ``` I was able to reproduce this issue locally by cross compiling (which produces the above error). But cross compiling with version 0.2 work and so I've attempted to upgrade to that version. This is very much a suggestion as I'm not familiar with the hnsw code but perhaps it will be useful to someone else and save some time investigating the issue. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
This commit changes the build.rs script to write the generated bindings to the src directory to avoid manual copying of the bindings.rs file. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
4164250
to
b667780
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, added some comments. After they have been explained, we can approve this. Thanks for your effort
This commit revert the change to the StopSequence option in llm-chain-llama/src/options.rs. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
This commit removes the `From<llama_batch> for LlamaBatch` impl as it is no longer needed. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
This commit creates a new LlamaBatch for new token sampled instead of reusing the same one. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
This commit extracts the logic for checking if the prompt is a question into a separate conditional check. I've tried to clarify the comment of this check as well so it is hopefully easier to understand now. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
This commit updates llama.cpp to the latest/later version.
The motivation for this is that the current version of llama.cpp is a little outdated and there have been changes to the llama.cpp API and also the model format. Currently it is not possible to use the new GGUF format and many of the available models are in this new format which can make it challenging to use this crate at the moment.
The following changes have been made:
update llama.cpp to latest version using git submodule update --remote --merge llama.cpp
Manually copied the generated bindings.rs file from the target directory to the src directory. Hope this was the correct thing to do.
Updated the llm-chain-llama crate to use llama_decode instead of llm_eval which has now been deprecated.
A number of TODOs have been added to the code to highlight areas that I know I need to look at.This is a work in progress but I wanted to open a draft pull request sooner rather than later to get some visibility and feedback.
Currently I've been able to successfully run the
simple example
,few_shot
, andstream
examples.The.map_reduce_llama
example is not working as of this writing which I'll look into further