-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add importance matrix support for legacy quants? #4932
Comments
Hi, ikawrakow. For your question, my opinion (as only a simple user) is that the "legacy" quants should remain the same, at least for now. The new imatrix technique you have invented seem to initially work very well with English (when using an English text as you have showcased), but I believe that a longer period of testing by the community will let us know how well they work in different uses and languages. Until then, I believe having the "legacy" quants exist as a fallback is the most desirable option. A lot of people don't use the quantization tool themselves and rely on the quants released by users like TheBloke. If there is an issue that has been overlooked and the new quants (created using an English text for the imatrix generation) perform worse in some areas than the previous versions, having the legacy quants as an alternative would be the best option in my opinion. |
@abc-nix Does your opinion remain the same after learning that one will be still able to quantize both, k-quants and legacy quants, without using an importance matrix? So that, in case there are issues, one can always fall back to the existing quantization? |
@ikawrakow My opinion amounts to less than a grain of sand, so don't consider it a general opinion but my own. I did take into account that the current k-quants can optionally use the new and improved imatrix method (forced only for q2-k quants I think), which is a great benefit. With this new method we will find in the wild as many quants as there are datasets used to compute the imatrix, but this will also bring many quants that may compete to being the best of its size. It may also improve desired performance in certain areas depending on the dataset used, making the new k-quants much better compared to generalized quants. But a normal user, from the outside, will not be able to distinguish one from another only looking at the final file. I think there should still be a reproducible format, the legacy format, that should be predictable to perform the same no matter if I create it, or I download it from a huggingface repo. Keeping the "legacy" quants as they are (even if it is optional to use the imatrix method, and this method could improve user experience) should also make it easier for people to help resolve issues some users may experience (like people complaining something is wrong with llama.cpp but after testing the legacy quant they realize the issue is with the specific dataset used on their k-quant or an issue with imatrix instead of the general program). Sometimes more options can also lead to more chaos. Having a reference quant that should be the same (without the risk of "mistakenly" using a bad dataset) for all users would make it easier to troubleshot. As I said, this is only my opinion. Discard it as you would a grain of sand. |
I only really care about the legacy quants for development; it is much easier to prototype features for q4_0 or q8_0 than any of the k-quants due to the much simpler data structure. I don't particularly care whether the legacy quants have slightly better/worse perplexity because I usually only need to check whether it changes. |
I think supporting legacy quants is needed. |
Optional importance matrix support for legacy quants similar to the one in #4930 would be useful. |
Closed via #4969 |
I have the implementation ready, but I'm not sure if this is what we want. Use of an importance matrix does improve perplexity for all models I have tried. But on the other hand the "legacy"
ggml
quantsQ4_0
andQ5_0
are never very good, but they are also never really bad (Q4_1
andQ5_1
have more erratic behavior, for some models being better thanQ4_0/Q5_0
and for other models being worse). Hence, one may want to preserve them the way they are as a kind of reference.Opinions?
The text was updated successfully, but these errors were encountered: