BUG: fix llama-cpp when some quantizations have multiple parts #2786

qinxuye · 2025-01-26T08:53:25Z

This PR fixed a few bugs:

Some model has GGUF with multiple parts and single file at the same time, e.g. for qwen2.5-instruct 7b, q4_k_m has two files, meanwhile q3_k_m has only one file, before this PR, when loading q3_k_m, the downloading of GGUF file will be skipped.
Fixes model_path for llama.cpp engine, before the path will be treated as a dir.

codingl2k1

LGTM

BUG: fix llama-cpp when mixing model with multiple parts

f3bef2c

XprobeBot added the bug Something isn't working label Jan 26, 2025

XprobeBot added this to the v1.x milestone Jan 26, 2025

qinxuye changed the title ~~BUG: fix llama-cpp when mixing model with multiple parts~~ BUG: fix llama-cpp when some quantizations have multiple parts Jan 26, 2025

qinxuye added 2 commits January 26, 2025 17:03

update readme

38897b3

fix ci

20128d0

codingl2k1 approved these changes Jan 26, 2025

View reviewed changes

qinxuye merged commit 9eb0fd4 into xorbitsai:main Jan 27, 2025
12 of 13 checks passed

qinxuye deleted the bug/llama-cpp branch January 27, 2025 02:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: fix llama-cpp when some quantizations have multiple parts #2786

BUG: fix llama-cpp when some quantizations have multiple parts #2786

qinxuye commented Jan 26, 2025 •

edited

Loading

codingl2k1 left a comment

BUG: fix llama-cpp when some quantizations have multiple parts #2786

BUG: fix llama-cpp when some quantizations have multiple parts #2786

Conversation

qinxuye commented Jan 26, 2025 • edited Loading

codingl2k1 left a comment

Choose a reason for hiding this comment

qinxuye commented Jan 26, 2025 •

edited

Loading