Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: DeepSeek MoE v1 #942

Merged
merged 4 commits into from
Aug 17, 2024
Merged

feat: DeepSeek MoE v1 #942

merged 4 commits into from
Aug 17, 2024

Conversation

llllvvuu
Copy link
Contributor

@llllvvuu llllvvuu commented Aug 16, 2024

DeepSeek is still releasing models on their older MoE architecture.

mlx_lm.convert --hf-path deepseek-ai/DeepSeek-Prover-V1.5-RL --mlx-path DeepSeek-Prover-V1.5-RL-8bit --q-bits 8 -q
mlx_lm.generate --model DeepSeek-Prover-V1.5-RL-8bit --ignore-chat-template --max-tokens 512 --prompt 'import Mathlib
import Aesop

set_option maxHeartbeats 0

open BigOperators Real Nat Topology Rat

/-- The second and fourth terms of a geometric sequence are $2$ and $6$. Which of the following is a possible first term?
Show that it is $\frac{2\sqrt{3}}{3}$.-/
theorem amc12b_2003_p6 (a r : ℝ) (u : ℕ → ℝ) (h₀ : ∀ k, u k = a * r ^ k) (h₁ : u 1 = 2)
  (h₂ : u 3 = 6) : u 0 = 2 / Real.sqrt 3 ∨ u 0 = -(2 / Real.sqrt 3) := by'

With MoE:

mlx_lm.convert --hf-path deepseek-ai/deepseek-moe-16b-chat --mlx-path deepseek-moe-16b-4bit-chat -q
mlx_lm.generate --model deepseek-moe-16b-4bit-chat --prompt "What is mixture-of-experts?" --max-tokens 512 

@llllvvuu llllvvuu force-pushed the feat/deepseek_v1 branch 2 times, most recently from d83e29a to 8a6cc47 Compare August 16, 2024 22:34
DeepSeek is still releasing models on the DeepSeek V1 architecture.

```sh
mlx_lm.convert --hf-path deepseek-ai/DeepSeek-Prover-V1.5-RL --mlx-path DeepSeek-Prover-V1.5-RL-8bit --q-bits 8 -q
mlx_lm.generate --model DeepSeek-Prover-V1.5-RL-8bit --ignore-chat-template --max-tokens 512 --prompt 'import Mathlib
import Aesop

set_option maxHeartbeats 0

open BigOperators Real Nat Topology Rat

/-- The second and fourth terms of a geometric sequence are $2$ and $6$. Which of the following is a possible first term?
Show that it is $\frac{2\sqrt{3}}{3}$.-/
theorem amc12b_2003_p6 (a r : ℝ) (u : ℕ → ℝ) (h₀ : ∀ k, u k = a * r ^ k) (h₁ : u 1 = 2)
  (h₂ : u 3 = 6) : u 0 = 2 / Real.sqrt 3 ∨ u 0 = -(2 / Real.sqrt 3) := by'
```
@llllvvuu llllvvuu changed the title feat: deepseek v1 feat: DeepSeek MoE v1 Aug 17, 2024
@awni
Copy link
Member

awni commented Aug 17, 2024

This looks great! Thanks for the addition!

Also it would be super useful to add the pre-converted models to the MLX Community if you are interested. You can pass for example --upload-repo mlx-community/deepseek-moe-16b-4bit-chat to the convert script and it will auto-upload it.

@awni awni merged commit 0164d20 into ml-explore:main Aug 17, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants