-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : add Q5_0 and Q5_1 quantization #1187
Conversation
|
If it is just for convenience, then lets keep Regarding the tables - we still need the |
Follow up on the idea by @ikawrakow in #729 (comment)
Q5_0
On M1 Pro, it evaluates at about
53 ms / token
for 7B modelThis format is bigger than
Q4_0
andQ4_2
.Perplexity for 7B:
6.0139
Q5_1
This format is the same size as
Q4_1
andQ4_3
.On M1 Pro, it evaluates at about
55 ms / token
for 7B modelThe AVX implementation might make use of the following trick: https://stackoverflow.com/a/24242696
Perplexity for 7B:
5.9934
TODO: