Fix for cohere plus #650

awni · 2024-04-04T17:11:49Z

Use the qk norm param to work with cohere plus.

Machine setting:

sudo sysctl iogpu.wired_lwm_mb=100000

Command for generation:

python -m mlx_lm.generate --model mlx-community/c4ai-command-r-plus-4bit --prompt "Write a quicksort in c++" --temp 0.0 --max-tokens 256 --use-default-chat-template

Command for QLoRA:

python -m mlx_lm.lora --model mlx-community/c4ai-command-r-plus-4bit --data ../lora/data --train --iters 1000  --batch-size 1 --lora-layers 16

Blaizzy · 2024-04-04T22:20:55Z

I was about to submit a PR, great I checked 😄.

Already uploaded the model to the hub.
https://huggingface.co/mlx-community/c4ai-command-r-plus-4bit

DenisSergeevitch · 2024-04-04T23:50:50Z

@Blaizzy Thank you! How much RAM does it require to run 4bit q?

awni · 2024-04-05T00:01:46Z

Needs about 65GB to generate with 4-bit. But the generation is slow right now, trying to debug the performance issue.

Blaizzy · 2024-04-05T10:45:02Z

@Blaizzy Thank you! How much RAM does it require to run 4bit q?

@DenisSergeevitch, as @awni said 👆🏽.

I can't run it myself, I use a M1 Air 16GB :)

DenisSergeevitch · 2024-04-05T12:15:22Z

Thank you, I will wait for i_q1 then

awni · 2024-04-05T20:57:37Z

Btw to get this to run reasonably fast on an M2 Ultra you need to set the wired GPU memory lower limit appropriately. Something like:

sudo sysctl iogpu.wired_lwm_mb=100000

angeloskath

Thanks!

jeanromainroy · 2024-04-09T14:28:49Z

I am running the 4-bit version of Command-R-Plus and I consistently see the GPU usage dropping during generation and performance becoming abysmal.

My machine is the M2 Ultra 192GB and,

ProductName: macOS
ProductVersion: 14.3
BuildVersion: 23D56

Blaizzy · 2024-04-09T15:11:59Z

@awni 👆🏽

awni · 2024-04-09T15:13:53Z

@jeanromainroy did you set the memory limits? You could try making it larger:

sudo sysctl iogpu.wired_lwm_mb=150000

jeanromainroy · 2024-04-09T16:36:27Z

Even after setting this,

sudo sysctl iogpu.wired_lwm_mb=150000

I still see the GPU usage dropping before the completion ends.

awni · 2024-04-09T16:38:24Z

Do you mind to open an issue and include the command, versions of MLX / MLX LM, OS etc?

fix for cohere plus

4adaaf6

awni requested a review from angeloskath April 4, 2024 18:58

N8python mentioned this pull request Apr 4, 2024

Add Command R Plus support ggerganov/llama.cpp#6491

Merged

awni mentioned this pull request Apr 4, 2024

update layer weight dims for command-r-+ ml-explore/mlx#958

Closed

4 tasks

version bump

db714e0

angeloskath approved these changes Apr 5, 2024

View reviewed changes

awni merged commit c386dd5 into main Apr 5, 2024
2 checks passed

awni deleted the cohere_plus branch April 5, 2024 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for cohere plus #650

Fix for cohere plus #650

awni commented Apr 4, 2024 •

edited

Loading

Blaizzy commented Apr 4, 2024 •

edited

Loading

DenisSergeevitch commented Apr 4, 2024

awni commented Apr 5, 2024

Blaizzy commented Apr 5, 2024 •

edited

Loading

DenisSergeevitch commented Apr 5, 2024

awni commented Apr 5, 2024 •

edited

Loading

angeloskath left a comment

jeanromainroy commented Apr 9, 2024

Blaizzy commented Apr 9, 2024

awni commented Apr 9, 2024 •

edited

Loading

jeanromainroy commented Apr 9, 2024 •

edited

Loading

awni commented Apr 9, 2024

Fix for cohere plus #650

Fix for cohere plus #650

Conversation

awni commented Apr 4, 2024 • edited Loading

Blaizzy commented Apr 4, 2024 • edited Loading

DenisSergeevitch commented Apr 4, 2024

awni commented Apr 5, 2024

Blaizzy commented Apr 5, 2024 • edited Loading

DenisSergeevitch commented Apr 5, 2024

awni commented Apr 5, 2024 • edited Loading

angeloskath left a comment

Choose a reason for hiding this comment

jeanromainroy commented Apr 9, 2024

Blaizzy commented Apr 9, 2024

awni commented Apr 9, 2024 • edited Loading

jeanromainroy commented Apr 9, 2024 • edited Loading

awni commented Apr 9, 2024

awni commented Apr 4, 2024 •

edited

Loading

Blaizzy commented Apr 4, 2024 •

edited

Loading

Blaizzy commented Apr 5, 2024 •

edited

Loading

awni commented Apr 5, 2024 •

edited

Loading

awni commented Apr 9, 2024 •

edited

Loading

jeanromainroy commented Apr 9, 2024 •

edited

Loading