Add support for cohere2 #1157

Blaizzy · 2024-12-14T15:15:46Z

Adds support for Cohere2 with sliding attention.

Thanks a lot to @N8python for the inspiration!

Bf16

4bit

Co-authored-by: n8programs <43304488+N8python@users.noreply.github.com>

Add rotating kvcache to save space

Blaizzy · 2024-12-14T22:39:17Z

Thanks @N8python!

Verified this change saves ~2GB for 4bit:

and ~1.2GB for bf16:

awni · 2024-12-16T15:46:38Z

llms/mlx_lm/models/cohere2.py

+        if self.use_sliding_window and mask is not None:
+            key_len = keys.shape[-2]
+            if mask.shape[-1] != key_len:
+                mask = mask[..., -key_len:]


This changed slightly. You would be over trimming the keys/values during the prefill stage otherwise.

Oh, I see.

I thought of this but it was giving me a shape error when I tried exactly this. Because I knew the make_cache was already handling the kv slicing when I checked the shapes.

... === window_size 4096 keys shape (1, 8, 4096, 128) values shape (1, 8, 4096, 128) mask shape after (512, 4096) === window_size 4608 keys shape (1, 8, 4608, 128) values shape (1, 8, 4608, 128) mask shape after (512, 4608) === window_size 4608 keys shape (1, 8, 4608, 128) values shape (1, 8, 4608, 128) mask shape after (512, 4608) === ...

It seems like I should added the changes in mask (L158-163) that you added.

awni

Thanks!!

Blaizzy · 2024-12-16T21:15:01Z

Our pleasure!

add support for cohere2

d7c64c6

Blaizzy changed the title ~~add support for cohere2~~ Add support for cohere2 Dec 14, 2024

Blaizzy marked this pull request as draft December 14, 2024 15:17

Blaizzy and others added 6 commits December 14, 2024 16:22

revert to act_fn to silu

5d8b36c

fix tests and sliding window attention

52595da

add tests

2f443cc

add to tuner

0337646

fix sliding window

d7d7048

add coauthor :)

406c7f3

Co-authored-by: n8programs <43304488+N8python@users.noreply.github.com>

Blaizzy marked this pull request as ready for review December 14, 2024 16:13

N8 and others added 2 commits December 14, 2024 17:08

Add rotating kvcache to save space

ac58a95

Merge pull request #1 from N8python/add-cohere2-arch-rotating-kv-cache

20d7925

Add rotating kvcache to save space

Blaizzy mentioned this pull request Dec 14, 2024

Add Cohere2 #1158

Closed

some nits

4aee862

awni reviewed Dec 16, 2024

View reviewed changes

awni approved these changes Dec 16, 2024

View reviewed changes

awni added 2 commits December 16, 2024 07:53

style

dec2acf

nits

799dfde

awni merged commit dfa4dd6 into ml-explore:main Dec 16, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for cohere2 #1157

Add support for cohere2 #1157

Blaizzy commented Dec 14, 2024 •

edited

Loading

Blaizzy commented Dec 14, 2024 •

edited

Loading

awni Dec 16, 2024

Blaizzy Dec 16, 2024 •

edited

Loading

awni left a comment

Blaizzy commented Dec 16, 2024 •

edited

Loading

Add support for cohere2 #1157

Add support for cohere2 #1157

Conversation

Blaizzy commented Dec 14, 2024 • edited Loading

Blaizzy commented Dec 14, 2024 • edited Loading

awni Dec 16, 2024

Choose a reason for hiding this comment

Blaizzy Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

awni left a comment

Choose a reason for hiding this comment

Blaizzy commented Dec 16, 2024 • edited Loading

Blaizzy commented Dec 14, 2024 •

edited

Loading

Blaizzy commented Dec 14, 2024 •

edited

Loading

Blaizzy Dec 16, 2024 •

edited

Loading

Blaizzy commented Dec 16, 2024 •

edited

Loading