Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLamaSharp 0.11.2 Exception #660

Closed
kuan2019 opened this issue Apr 11, 2024 · 12 comments
Closed

LLamaSharp 0.11.2 Exception #660

kuan2019 opened this issue Apr 11, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@kuan2019
Copy link

kuan2019 commented Apr 11, 2024

Hi, When I am running the LLamaSharp 0.11.2 with DotNet 7, I got a exception suddenly as below how can I fixed it?

Unhandled exception. LLama.Exceptions.LLamaDecodeError: llama_decode failed: 'NoKvSlot'
   at LLama.InteractiveExecutor.InferInternal(IInferenceParams inferenceParams, InferStateArgs args)
   at LLama.StatefulExecutorBase.InferAsync(String text, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext()
   at LLama.StatefulExecutorBase.InferAsync(String text, IInferenceParams inferenceParams, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at LLama.ChatSession.ChatAsyncInternal(String prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext()
   at LLama.ChatSession.ChatAsyncInternal(String prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext()
   at LLama.ChatSession.ChatAsyncInternal(String prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at LLama.ChatSession.ChatAsync(Message message, Boolean applyInputTransformPipeline, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext()
   at LLama.ChatSession.ChatAsync(Message message, Boolean applyInputTransformPipeline, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext()
   at LLama.ChatSession.ChatAsync(Message message, Boolean applyInputTransformPipeline, IInferenceParams inferenceParams, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at Program.<Main>$(String[] args) in /Users/eric.hu/Projects/Net7.Llamasharp.Console/Net7.Llamasharp.Console/Program.cs:line 36
   at Program.<Main>$(String[] args) in /Users/eric.hu/Projects/Net7.Llamasharp.Console/Net7.Llamasharp.Console/Program.cs:line 36
   at Program.<Main>(String[] args)
@martindevans
Copy link
Member

The fundamental error: llama_decode failed: 'NoKvSlot' means that the inference system basically ran out of memory and can't store any more tokens. You'll need to have a bigger KV cache, or work with less tokens.

@kuan2019
Copy link
Author

@martindevans Got it! thanks.

@hgpym
Copy link

hgpym commented Apr 12, 2024

I'm currently having the exact same problem after upgrading from version 10 to 11.2. I did not change the model parameters. I tried reducing the number of tokens as advised, but the error remained.

@martindevans
Copy link
Member

martindevans commented Apr 12, 2024

That's odd. The NoKvSlot error is pretty much passed straight through from llama.cpp, there's not a lot going on on the C# side that could be a problem there.

Can you tell us some more details - what are your loading parameters? What model are you using? How much text are you evaluating etc?

Edit: Also, is this dotnet7.0 only? Or are you using a different version?

@martindevans martindevans added the bug Something isn't working label Apr 12, 2024
@hgpym
Copy link

hgpym commented Apr 13, 2024

Model: open-chat-3.5-0106

Model Params:
ContextSize = 1024, Seed = 1337, Threads = (uint)Math.Max(Environment.ProcessorCount / 2, 1), UseMemorymap = true, UseMemoryLock = true, BatchSize = 512, Encoding = Encoding.UTF8, EmbeddingMode = true, GpuLayerCount = 28,

Inference Params:
MaxTokens = 512, AntiPrompts = new List<string> { "User:" }, Temperature = 0.7f, RepeatPenalty = 1.0f, TopK = 50, TopP = 0.95f,

With these parameters everything worked fine in previous versions. Use .net 8.0 :)

@martindevans
Copy link
Member

Those settings look fine :/

As far as I'm aware the only way this should raise NoKvSlot is if you try to use more than 1024 tokens, or if the cache is very fragmented (unlikely, unless you're doing weird things directly with the kv cache).

I assume you're getting this well short of 1024 tokens?

@hgpym
Copy link

hgpym commented Apr 14, 2024

I tried increasing the ContextSize parameter to 2048, and now there is no error during long-term communication. But now, of course, it takes longer to start than usual, hehe.

@hgpym
Copy link

hgpym commented Apr 14, 2024

I tried to change MaxTokens, but the problem was not solved.

@martindevans
Copy link
Member

MaxTokens is a limit on how many tokens to generate.

It might help if you're careful with your numbers.

For example:

  1. with a context size of 1024 you prompt with 512 tokens and then set MaxTokens to 512.
  2. That could generate 200 tokens.
  3. Now you set MaxTokens to 312. etc

If expanding the context size fixes it, it sounds to me like you're simply using up all your available token space.

@AsakusaRinne
Copy link
Collaborator

This issue is expected to be fixed in the current master branch. Could you please try again with the master branch?

@hgpym
Copy link

hgpym commented May 13, 2024

In version 0.12.0 problem resolved itself :|

@AsakusaRinne
Copy link
Collaborator

Thank you for your feedback, closing this issue as completed now. Please feel free to comment here if the problem reappears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants