Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : fix qs.n_attention_wv for DeepSeek-V2 #9156

Merged
merged 1 commit into from
Aug 27, 2024

Conversation

compilade
Copy link
Collaborator

Should fix #9155

This previously (before #8526) did not trigger the assertion because a value of 0 was accepted for recurrent models, but DeepSeek-V2(-Lite) is not a recurrent model.

Counting either attn_kv_a_mqa.weight or attn_kv_b.weight should fix this, but I went with the shorter of the two to fit vertically with the other conditions in the if which counts those tensors.

@mann1x can you confirm whether or not this fixes the problem?


@compilade compilade added bugfix fixes an issue or bug Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels Aug 24, 2024
@mann1x
Copy link

mann1x commented Aug 25, 2024

@compilade
Sorry, must have seen the notification while I was already sleeping yesterday... was unread but I have no recollection of it.

Tested it and works, was able to quantize and run the model on ollama.

@mann1x
Copy link

mann1x commented Aug 27, 2024

@ggerganov
Can you check the tests which are failing?
Seems there's a problem with the CI pipeline, not the PR.

@ggerganov ggerganov merged commit 78eb487 into master Aug 27, 2024
50 of 53 checks passed
@ggerganov ggerganov deleted the compilade/fix-deepseek-n_wv branch August 27, 2024 10:09
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: GGML_ASSERT((qs.n_attention_wv == n_attn_layer) && "n_attention_wv is unexpected") failed with deepseek2
3 participants