-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade v1/v2 format to v3 by leveraging quantize #1504
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
I would not add it into ggml.c . It's legacy, which we don't want to carry around. |
No mean to carry it forever. maybe remove after couple of weeks. the data format (struct block_q4_0) is only defined in ggml.c. I don't see there is other way to do so unless we copy the definition. |
Maybe it can be made into a small independent software, so that it will not become a burden. Then modify the tips on the README.md by the way. |
The intention is having a more seamless experience when upgrading model version. It is not goal to have a seperate tool or maintain this longer term. |
Thank you very much for making a lot of my old models useful again. Unfortunately,Now there is a new merge that seems to break backward compatibility again. In order to deal with the same thing happening again, it should be reasonable to provide a special tool. |
yes, it is fine to just keep this PR as a PR and don't merge. I will make some code change after F16 change merged. |
Isn't it possible to integrate this as a separate tool? That way the legacy code could be kept away from the main program and the conversion would still be possible. |
You may notice the changes are in llama.cpp and ggml.c. If we want a new application, we pretty much copy the code. |
The quantization code is copied several times already, actually. One in ggml.c, then ggml-cuda.cu and also ggml-opencl.c as well. |
Tested with v1 & v2 file of Q4_0 only. I don't have other format file. Please report the bug here. @ggerganov this is ugly patch but it works. It is so painful if we don't provide convert tool for the old models. But I don't have much time to build another tool (and I don't think it is worth the effort as an intermediate tool.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There may be a compromise, that is, to create a fixed branch that contains the format conversion feature, which does not need to keep track of the latest code. |
Leverage quantize executable to support upgrade the models from v1 (previous) to v2 (latest).
Usage:
quantize <old_quantized_model> <new_mode_name> type
type must be match with the previous file type. The tool will not support re-quantize into another type.