src: make tail invalid when kv cell is intersection for mamba #9249

kylo5aby · 2024-08-30T09:09:45Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Reset the tail of kv cell for the common prefix tokens for mamba. That maybe fix #9170

compilade · 2024-08-30T18:16:42Z

src/llama.cpp

                if ((0 < p0 && p0 <= cell.pos) || (0 < p1 && p1 <= cell.pos)) {
-                    return false;
-                }
-                if (p0 <= cell.pos && p1 < cell.pos) {
                    tail_id = -1;
+                    return false;
                }


This does not fix the problem. Building llama-server from this branch, I still get

llama_kv_cache_find_slot: wrong tail for seq_id 1, (0 instead of -1) /.../src/llama.cpp:3507: GGML_ASSERT(cell.has_seq_id(seq_id)) failed

When using the first request from #9170 with curl and then using the server Web UI for something unrelated. I'll investigate the root cause, but it does seem like the tail is not invalidated properly somewhere (which by the way also has to be done in sync with clearing the actual tail cell too, which is why the tail was previously invalidated here only when not returning and when the cell (which was the tail) would be cleared with the range conditions below).

Strangely enough, there is no problem when I try using the server UI directly.

Hmm, re-reading my parenthesis above, the problem in the code could actually simply be

diff --git a/src/llama.cpp b/src/llama.cpp index 2274296b..6c3d4691 100644 --- a/src/llama.cpp +++ b/src/llama.cpp @@ -3689,7 +3689,8 @@ static bool llama_kv_cache_seq_rm( if ((0 < p0 && p0 <= cell.pos) || (0 < p1 && p1 <= cell.pos)) { return false; } - if (p0 <= cell.pos && p1 < cell.pos) { + // invalidate tails which will be cleared + if (p0 <= cell.pos && cell.pos < p1) { tail_id = -1; } }

(when diffed from master)

Re-running the requests from #9170 no longer seems to fail with this.

Thanks the feedback! I haven't tried this through curl, and don't know why there are different behavior between curl and web ui, maybe because of the different times of response? I have seen the times of response maybe inconsistent for the same request(same parameters).

compilade

@jploski Does this fix the problem you've seen in #9170?

jploski · 2024-09-02T08:43:05Z

@jploski Does this fix the problem you've seen in #9170?

Yes, it does!

Although I see that the full prompt/history is being reprocessed on each request despite the prefix being same as in the previous request (i.e. no hidden state/context caching). Is the cache supposed to work for Mamba at this point or is it maybe still a work-in-progress?

compilade · 2024-09-02T15:55:54Z

Although I see that the full prompt/history is being reprocessed on each request despite the prefix being same as in the previous request (i.e. no hidden state/context caching). Is the cache supposed to work for Mamba at this point or is it maybe still a work-in-progress?

This happens when llama-server removes more than one token at the end of the generated text or when it trims whitespace. But caching should work otherwise.

It should be fixed in #7531, but it will probably be "unfixed" before merging because the method used is too over-engineered and too complicated. (tree of sequences to fairly share available cells adds a lot of complexity)

The plan is to add explicit checkpoints requested by llama-server so that guarding against trimming requires fewer checkpoints.

…nov#9249)

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Aug 30, 2024

compilade self-requested a review August 30, 2024 16:45

compilade requested changes Aug 30, 2024

View reviewed changes

src: make tail invalid when kv cell is intersection for mamba

4e8f9a0

kylo5aby force-pushed the kv_cache_rm branch from 1082138 to 4e8f9a0 Compare September 2, 2024 02:27

compilade approved these changes Sep 2, 2024

View reviewed changes

compilade mentioned this pull request Sep 2, 2024

llama : add RWKV models support #846

Closed

compilade added merge ready indicates that this may be ready to merge soon and is just holding out in case of objections bugfix fixes an issue or bug labels Sep 2, 2024

compilade merged commit f148516 into ggerganov:master Sep 2, 2024
52 checks passed

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024

src: make tail invalid when kv cell is intersection for mamba (ggerga…

e187119

…nov#9249)

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

src: make tail invalid when kv cell is intersection for mamba (ggerga…

7b08dba

…nov#9249)

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

src: make tail invalid when kv cell is intersection for mamba (ggerga…

8565241

…nov#9249)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src: make tail invalid when kv cell is intersection for mamba #9249

src: make tail invalid when kv cell is intersection for mamba #9249

kylo5aby commented Aug 30, 2024

compilade Aug 30, 2024 •

edited

Loading

kylo5aby Sep 2, 2024

compilade left a comment

jploski commented Sep 2, 2024

compilade commented Sep 2, 2024 •

edited

Loading

src: make tail invalid when kv cell is intersection for mamba #9249

src: make tail invalid when kv cell is intersection for mamba #9249

Conversation

kylo5aby commented Aug 30, 2024

compilade Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

kylo5aby Sep 2, 2024

Choose a reason for hiding this comment

compilade left a comment

Choose a reason for hiding this comment

jploski commented Sep 2, 2024

compilade commented Sep 2, 2024 • edited Loading

compilade Aug 30, 2024 •

edited

Loading

compilade commented Sep 2, 2024 •

edited

Loading