Fix rotating kv cache size #1093

angeloskath · 2024-11-05T07:22:42Z

Currently the rotating KV cache can be both max_size and max_size - 1 depending on whether we filled it by generating or by prompt token processing. This PR makes sure the max is always max_size and not max_size - 1.

The following two simple repros exemplify the problem a bit:

Generate past max size

from mlx_lm.models.cache import make_prompt_cache
from mlx_lm.utils import load
from mlx_lm.utils import stream_generate

PROMPT = "Tell me a story"

model, tokenizer = load("mlx-community/Llama-3.2-3B-Instruct-4bit")

cache = make_prompt_cache(model, max_kv_size=100)
for text in stream_generate(model, tokenizer, PROMPT, prompt_cache=cache, max_tokens=100):
    pass
for text in stream_generate(model, tokenizer, PROMPT, prompt_cache=cache, max_tokens=100):
    pass

Prompt process past max size

from mlx_lm.models.cache import make_prompt_cache
from mlx_lm.utils import load
from mlx_lm.utils import stream_generate

PROMPT = "Tell me a story "

model, tokenizer = load("mlx-community/Llama-3.2-3B-Instruct-4bit")

cache = make_prompt_cache(model, max_kv_size=512)
for text in stream_generate(model, tokenizer, PROMPT * 384, prompt_cache=cache, max_tokens=50):
    pass

The first breaks on main. Then changing line 45 on base.py breaks the 2nd test and changing the trim size makes sure everything works. Btw these also occur in mlx_lm.chat and mlx_lm.cache_prompt it is just more explicit with the tests above.

awni

Looks good thanks for digging into that and fixing it!

Matches ml-explore/mlx-examples#1093

Fix rotating kv cache size

19884e5

angeloskath requested a review from awni November 5, 2024 07:22

awni approved these changes Nov 5, 2024

View reviewed changes

angeloskath merged commit ed9e81d into main Nov 5, 2024
2 checks passed

angeloskath deleted the fix-rotating-cache branch November 5, 2024 18:24

zcbenz added a commit to frost-beta/llm.js that referenced this pull request Dec 7, 2024

Update KVCache implementation

6d86c2f

Matches ml-explore/mlx-examples#1093

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix rotating kv cache size #1093

Fix rotating kv cache size #1093

angeloskath commented Nov 5, 2024

awni left a comment

Fix rotating kv cache size #1093

Fix rotating kv cache size #1093

Conversation

angeloskath commented Nov 5, 2024

Generate past max size

Prompt process past max size

awni left a comment

Choose a reason for hiding this comment