Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to understand kv_cache ring-buffer? #4251

Closed
yancaoweidaode opened this issue Nov 29, 2023 · 3 comments
Closed

how to understand kv_cache ring-buffer? #4251

yancaoweidaode opened this issue Nov 29, 2023 · 3 comments
Labels

Comments

@yancaoweidaode
Copy link
Contributor

What does kv_cache's ring-buffer mean and how does it work?

@cmp-nct
Copy link
Contributor

cmp-nct commented Nov 29, 2023

Maybe I missed something, though one thing that was converted into a ring buffer was the "generated token storage".
A while ago all tokens that were generated were put into a integer vector output_tokens, once context size was reached a part of that entire storage was cut away and re-evaluations happened causing quite big delays in processing.
The sequential storage was just a "hacky" start to get going and remained that way for a while.

That mechanism was changed quite neatly, it's still a vector of integers (now in a sampling_context struct) but it's initialized at context size and whenever a token is generated it's added at the end and the oldest token (first one) is moved out.
This way the output vector now represents the actual context window.

In addition there are routines to modify the kv cache itself which stores the evaluated embeddings for each token sampled.
modifying kvcache was a bit more "raw" before that (with no API, so you had to get into libllama.cpp and understand the cache tensor structure) but now there are quite a few functions.
Refer to this PR that added them: #3228

@yancaoweidaode
Copy link
Contributor Author

Ok, thank you very much!

@github-actions github-actions bot added the stale label Mar 19, 2024
Copy link
Contributor

github-actions bot commented Apr 3, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants