how to understand kv_cache ring-buffer? #4251

yancaoweidaode · 2023-11-29T06:51:38Z

What does kv_cache's ring-buffer mean and how does it work?

cmp-nct · 2023-11-29T17:07:58Z

Maybe I missed something, though one thing that was converted into a ring buffer was the "generated token storage".
A while ago all tokens that were generated were put into a integer vector output_tokens, once context size was reached a part of that entire storage was cut away and re-evaluations happened causing quite big delays in processing.
The sequential storage was just a "hacky" start to get going and remained that way for a while.

That mechanism was changed quite neatly, it's still a vector of integers (now in a sampling_context struct) but it's initialized at context size and whenever a token is generated it's added at the end and the oldest token (first one) is moved out.
This way the output vector now represents the actual context window.

In addition there are routines to modify the kv cache itself which stores the evaluated embeddings for each token sampled.
modifying kvcache was a bit more "raw" before that (with no API, so you had to get into libllama.cpp and understand the cache tensor structure) but now there are quite a few functions.
Refer to this PR that added them: #3228

yancaoweidaode · 2023-12-01T04:02:55Z

Ok, thank you very much!

github-actions · 2024-04-03T01:14:59Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions bot added the stale label Mar 19, 2024

github-actions bot closed this as completed Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to understand kv_cache ring-buffer? #4251

how to understand kv_cache ring-buffer? #4251

yancaoweidaode commented Nov 29, 2023

cmp-nct commented Nov 29, 2023

yancaoweidaode commented Dec 1, 2023

github-actions bot commented Apr 3, 2024

how to understand kv_cache ring-buffer? #4251

how to understand kv_cache ring-buffer? #4251

Comments

yancaoweidaode commented Nov 29, 2023

cmp-nct commented Nov 29, 2023

yancaoweidaode commented Dec 1, 2023

github-actions bot commented Apr 3, 2024