Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix llama-cpp when some quantizations have multiple parts #2786

Merged
merged 3 commits into from
Jan 27, 2025

Conversation

qinxuye
Copy link
Contributor

@qinxuye qinxuye commented Jan 26, 2025

This PR fixed a few bugs:

  1. Some model has GGUF with multiple parts and single file at the same time, e.g. for qwen2.5-instruct 7b, q4_k_m has two files, meanwhile q3_k_m has only one file, before this PR, when loading q3_k_m, the downloading of GGUF file will be skipped.
  2. Fixes model_path for llama.cpp engine, before the path will be treated as a dir.

@XprobeBot XprobeBot added the bug Something isn't working label Jan 26, 2025
@XprobeBot XprobeBot added this to the v1.x milestone Jan 26, 2025
@qinxuye qinxuye changed the title BUG: fix llama-cpp when mixing model with multiple parts BUG: fix llama-cpp when some quantizations have multiple parts Jan 26, 2025
Copy link
Contributor

@codingl2k1 codingl2k1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye merged commit 9eb0fd4 into xorbitsai:main Jan 27, 2025
12 of 13 checks passed
@qinxuye qinxuye deleted the bug/llama-cpp branch January 27, 2025 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants