ggml : add Q5_0 and Q5_1 quantization #1187

ggerganov · 2023-04-26T07:39:03Z

Follow up on the idea by @ikawrakow in #729 (comment)

Q5_0

#define QK5_0 32
typedef struct {
    ggml_fp16_t d;          // delta
    uint8_t qh[4];          // 5-th bit of quants (uint32_t)
    uint8_t qs[QK5_0 / 2];  // nibbles / quants
} block_q5_0;

On M1 Pro, it evaluates at about 53 ms / token for 7B model
This format is bigger than Q4_0 and Q4_2.

Perplexity for 7B: 6.0139

main: seed = 1682523351
llama.cpp: loading model from ../models/7B/ggml-model-q5_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 8 (mostly Q5_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4525003.11 KB
llama_model_load_internal: mem required  = 6210.95 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 12 / 64 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.90 seconds per pass - ETA 20 minutes
[1]4.2484,[2]4.7547,[3]5.6316,[4]6.2345,[5]6.3575,[6]6.3361,[7]6.5288,[8]6.6259,[9]6.9688,[10]7.2116,[11]7.4185,[12]7.4477,[13]7.3615,[14]7.4028,[15]7.6442,[16]7.2662,[17]7.1544,[18]7.1013,[19]6.7447,[20]6.7341,[21]6.6423,[22]6.4727,[23]6.4417,[24]6.3490,[25]6.3524,[26]6.1948,[27]6.0225,[28]5.9267,[29]5.8384,[30]5.6840,[31]5.6553,[32]5.6751,[33]5.6167,[34]5.6509,[35]5.6727,[36]5.7108,[37]5.7162,[38]5.7249,[39]5.7569,[40]5.8063,[41]5.8181,[42]5.8563,[43]5.8172,[44]5.8757,[45]5.8774,[46]5.8511,[47]5.8708,[48]5.8464,[49]5.8466,[50]5.8085,[51]5.8044,[52]5.7966,[53]5.8401,[54]5.8246,[55]5.8020,[56]5.8319,[57]5.8511,[58]5.8723,[59]5.8888,[60]5.9307,[61]5.9239,[62]5.9817,[63]6.0119,[64]6.0250,[65]6.0682,[66]6.0762,[67]6.0930,[68]6.1060,[69]6.1291,[70]6.1593,[71]6.1827,[72]6.2131,[73]6.2709,[74]6.2743,[75]6.2872,[76]6.2990,[77]6.3104,[78]6.2966,[79]6.3240,[80]6.3168,[81]6.3290,[82]6.3332,[83]6.2818,[84]6.2637,[85]6.2520,[86]6.2305,[87]6.1659,[88]6.1399,[89]6.1208,[90]6.1060,[91]6.1289,[92]6.1233,[93]6.1227,[94]6.1208,[95]6.1480,[96]6.1481,[97]6.1437,[98]6.1379,[99]6.1246,[100]6.1235,[101]6.1472,[102]6.1415,[103]6.1609,[104]6.1672,[105]6.1674,[106]6.1840,[107]6.1833,[108]6.1953,[109]6.1898,[110]6.1861,[111]6.2078,[112]6.2281,[113]6.2300,[114]6.2262,[115]6.2329,[116]6.2228,[117]6.2279,[118]6.2566,[119]6.2776,[120]6.3119,[121]6.3264,[122]6.3510,[123]6.3874,[124]6.4045,[125]6.3951,[126]6.4335,[127]6.4703,[128]6.5000,[129]6.4848,[130]6.4939,[131]6.4899,[132]6.4836,[133]6.4700,[134]6.4798,[135]6.4761,[136]6.4651,[137]6.4581,[138]6.4401,[139]6.4302,[140]6.4270,[141]6.3973,[142]6.3939,[143]6.3640,[144]6.3438,[145]6.3341,[146]6.3221,[147]6.3254,[148]6.3252,[149]6.3196,[150]6.3152,[151]6.3173,[152]6.3077,[153]6.2910,[154]6.2824,[155]6.2890,[156]6.2838,[157]6.3001,[158]6.3039,[159]6.3088,[160]6.3113,[161]6.3228,[162]6.2941,[163]6.2831,[164]6.2598,[165]6.2292,[166]6.2024,[167]6.1659,[168]6.1355,[169]6.1222,[170]6.1113,[171]6.0850,[172]6.0680,[173]6.0515,[174]6.0219,[175]6.0007,[176]5.9895,[177]5.9700,[178]5.9476,[179]5.9303,[180]5.9207,[181]5.8998,[182]5.8821,[183]5.8682,[184]5.8678,[185]5.8605,[186]5.8607,[187]5.8668,[188]5.8631,[189]5.8800,[190]5.8808,[191]5.9013,[192]5.9171,[193]5.9332,[194]5.9440,[195]5.9652,[196]5.9808,[197]6.0014,[198]6.0161,[199]6.0190,[200]6.0240,[201]6.0190,[202]6.0373,[203]6.0446,[204]6.0430,[205]6.0534,[206]6.0602,[207]6.0560,[208]6.0648,[209]6.0689,[210]6.0739,[211]6.0842,[212]6.0916,[213]6.1022,[214]6.1043,[215]6.1072,[216]6.1210,[217]6.1388,[218]6.1515,[219]6.1514,[220]6.1479,[221]6.1431,[222]6.1408,[223]6.1310,[224]6.1242,[225]6.1201,[226]6.1407,[227]6.1492,[228]6.1545,[229]6.1608,[230]6.1582,[231]6.1744,[232]6.1626,[233]6.1464,[234]6.1317,[235]6.1126,[236]6.1058,[237]6.0962,[238]6.0987,[239]6.0844,[240]6.0742,[241]6.0768,[242]6.0802,[243]6.0784,[244]6.0674,[245]6.0641,[246]6.0532,[247]6.0416,[248]6.0345,[249]6.0322,[250]6.0368,[251]6.0298,[252]6.0264,[253]6.0170,[254]6.0116,[255]6.0000,[256]5.9825,[257]5.9702,[258]5.9622,[259]5.9603,[260]5.9523,[261]5.9478,[262]5.9425,[263]5.9367,[264]5.9148,[265]5.9142,[266]5.9126,[267]5.9060,[268]5.9154,[269]5.9131,[270]5.9141,[271]5.9219,[272]5.9253,[273]5.9252,[274]5.9277,[275]5.9362,[276]5.9423,[277]5.9579,[278]5.9679,[279]5.9771,[280]5.9799,[281]5.9897,[282]5.9957,[283]6.0103,[284]6.0183,[285]6.0268,[286]6.0399,[287]6.0392,[288]6.0455,[289]6.0373,[290]6.0221,[291]6.0074,[292]5.9927,[293]5.9794,[294]5.9817,[295]5.9804,[296]5.9848,[297]5.9835,[298]5.9864,[299]5.9839,[300]5.9735,[301]5.9736,[302]5.9659,[303]5.9570,[304]5.9485,[305]5.9450,[306]5.9325,[307]5.9347,[308]5.9381,[309]5.9224,[310]5.9169,[311]5.9105,[312]5.9130,[313]5.9078,[314]5.9061,[315]5.8907,[316]5.8854,[317]5.8697,[318]5.8496,[319]5.8614,[320]5.8732,[321]5.8779,[322]5.8741,[323]5.8675,[324]5.8646,[325]5.8744,[326]5.8745,[327]5.8768,[328]5.8806,[329]5.8864,[330]5.8888,[331]5.9009,[332]5.8980,[333]5.9046,[334]5.8992,[335]5.8932,[336]5.8969,[337]5.8943,[338]5.8933,[339]5.8882,[340]5.8840,[341]5.8921,[342]5.8947,[343]5.8994,[344]5.8996,[345]5.9001,[346]5.8977,[347]5.9012,[348]5.9044,[349]5.9067,[350]5.9034,[351]5.9042,[352]5.9046,[353]5.8989,[354]5.8991,[355]5.9041,[356]5.9069,[357]5.9036,[358]5.9126,[359]5.9150,[360]5.9116,[361]5.9112,[362]5.9180,[363]5.9290,[364]5.9354,[365]5.9405,[366]5.9415,[367]5.9496,[368]5.9472,[369]5.9480,[370]5.9495,[371]5.9441,[372]5.9489,[373]5.9536,[374]5.9518,[375]5.9520,[376]5.9588,[377]5.9543,[378]5.9570,[379]5.9628,[380]5.9551,[381]5.9519,[382]5.9471,[383]5.9465,[384]5.9459,[385]5.9449,[386]5.9444,[387]5.9443,[388]5.9407,[389]5.9354,[390]5.9286,[391]5.9210,[392]5.9171,[393]5.9157,[394]5.9183,[395]5.9171,[396]5.9099,[397]5.9167,[398]5.9206,[399]5.9285,[400]5.9288,[401]5.9302,[402]5.9312,[403]5.9331,[404]5.9394,[405]5.9302,[406]5.9272,[407]5.9267,[408]5.9283,[409]5.9398,[410]5.9506,[411]5.9615,[412]5.9771,[413]5.9875,[414]5.9950,[415]6.0003,[416]6.0078,[417]6.0197,[418]6.0234,[419]6.0302,[420]6.0391,[421]6.0505,[422]6.0545,[423]6.0617,[424]6.0719,[425]6.0805,[426]6.0869,[427]6.0912,[428]6.0997,[429]6.1048,[430]6.1131,[431]6.1270,[432]6.1308,[433]6.1302,[434]6.1262,[435]6.1271,[436]6.1297,[437]6.1392,[438]6.1467,[439]6.1436,[440]6.1426,[441]6.1377,[442]6.1362,[443]6.1377,[444]6.1378,[445]6.1361,[446]6.1386,[447]6.1417,[448]6.1458,[449]6.1432,[450]6.1442,[451]6.1402,[452]6.1267,[453]6.1184,[454]6.1129,[455]6.1138,[456]6.1184,[457]6.1204,[458]6.1181,[459]6.1187,[460]6.1272,[461]6.1247,[462]6.1232,[463]6.1274,[464]6.1262,[465]6.1234,[466]6.1157,[467]6.1158,[468]6.1155,[469]6.1175,[470]6.1179,[471]6.1131,[472]6.1174,[473]6.1121,[474]6.1135,[475]6.1075,[476]6.1092,[477]6.1020,[478]6.1010,[479]6.1070,[480]6.1113,[481]6.1133,[482]6.1088,[483]6.1046,[484]6.1065,[485]6.1049,[486]6.0994,[487]6.0992,[488]6.0971,[489]6.0926,[490]6.0904,[491]6.0875,[492]6.0820,[493]6.0792,[494]6.0777,[495]6.0774,[496]6.0738,[497]6.0683,[498]6.0665,[499]6.0624,[500]6.0532,[501]6.0467,[502]6.0470,[503]6.0463,[504]6.0378,[505]6.0400,[506]6.0406,[507]6.0350,[508]6.0310,[509]6.0304,[510]6.0339,[511]6.0384,[512]6.0419,[513]6.0439,[514]6.0502,[515]6.0448,[516]6.0439,[517]6.0446,[518]6.0445,[519]6.0473,[520]6.0499,[521]6.0512,[522]6.0538,[523]6.0544,[524]6.0600,[525]6.0632,[526]6.0643,[527]6.0661,[528]6.0610,[529]6.0615,[530]6.0564,[531]6.0551,[532]6.0596,[533]6.0619,[534]6.0606,[535]6.0629,[536]6.0575,[537]6.0554,[538]6.0603,[539]6.0614,[540]6.0651,[541]6.0654,[542]6.0666,[543]6.0681,[544]6.0691,[545]6.0673,[546]6.0682,[547]6.0641,[548]6.0594,[549]6.0594,[550]6.0565,[551]6.0532,[552]6.0513,[553]6.0478,[554]6.0457,[555]6.0427,[556]6.0423,[557]6.0445,[558]6.0410,[559]6.0407,[560]6.0405,[561]6.0408,[562]6.0384,[563]6.0380,[564]6.0423,[565]6.0443,[566]6.0443,[567]6.0423,[568]6.0427,[569]6.0415,[570]6.0443,[571]6.0446,[572]6.0457,[573]6.0458,[574]6.0425,[575]6.0419,[576]6.0417,[577]6.0402,[578]6.0383,[579]6.0389,[580]6.0326,[581]6.0291,[582]6.0280,[583]6.0288,[584]6.0291,[585]6.0216,[586]6.0149,[587]6.0154,[588]6.0202,[589]6.0255,[590]6.0285,[591]6.0305,[592]6.0295,[593]6.0265,[594]6.0274,[595]6.0252,[596]6.0284,[597]6.0265,[598]6.0236,[599]6.0257,[600]6.0253,[601]6.0240,[602]6.0252,[603]6.0281,[604]6.0290,[605]6.0323,[606]6.0345,[607]6.0328,[608]6.0295,[609]6.0304,[610]6.0339,[611]6.0321,[612]6.0346,[613]6.0311,[614]6.0262,[615]6.0191,[616]6.0218,[617]6.0160,[618]6.0112,[619]6.0058,[620]5.9924,[621]5.9857,[622]5.9840,[623]5.9856,[624]5.9860,[625]5.9861,[626]5.9850,[627]5.9872,[628]5.9874,[629]5.9869,[630]5.9900,[631]5.9954,[632]6.0009,[633]5.9995,[634]6.0030,[635]6.0036,[636]6.0002,[637]5.9969,[638]5.9994,[639]5.9963,[640]5.9972,[641]5.9975,[642]6.0040,[643]6.0063,[644]6.0075,[645]6.0056,[646]6.0096,[647]6.0059,[648]6.0067,[649]6.0068,[650]6.0106,[651]6.0159,[652]6.0168,[653]6.0206,[654]6.0144,[655]6.0139,
llama_print_timings:        load time =  4416.87 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 1163995.97 ms / 335360 tokens (    3.47 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 1198461.81 ms

Q5_1

#define QK5_1 32
typedef struct {
    ggml_fp16_t d;          // delta
    ggml_fp16_t m;          // min
    uint32_t qh;            // 5-th bit of quants
    uint8_t qs[QK5_1 / 2];  // nibbles / quants
} block_q5_1;

This format is the same size as Q4_1 and Q4_3.
On M1 Pro, it evaluates at about 55 ms / token for 7B model

The AVX implementation might make use of the following trick: https://stackoverflow.com/a/24242696

Perplexity for 7B: 5.9934

main: seed = 1682491079
llama.cpp: loading model from ../models/7B/ggml-model-q5_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 8 (mostly Q5_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4936267.11 KB
llama_model_load_internal: mem required  = 6612.57 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 12 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
4.47 seconds per pass - ETA 48 minutes
[1]4.2726,[2]4.7565,[3]5.6331,[4]6.2042,[5]6.3451,[6]6.3059,[7]6.4909,[8]6.5871,[9]6.9243,[10]7.1597,[11]7.3774,[12]7.4015,[13]7.3209,[14]7.3676,[15]7.6199,[16]7.2420,[17]7.1286,[18]7.0729,[19]6.7181,[20]6.7082,[21]6.6191,[22]6.4438,[23]6.4184,[24]6.3280,[25]6.3274,[26]6.1686,[27]5.9965,[28]5.8979,[29]5.8120,[30]5.6595,[31]5.6332,[32]5.6517,[33]5.5956,[34]5.6265,[35]5.6486,[36]5.6873,[37]5.6899,[38]5.7015,[39]5.7330,[40]5.7819,[41]5.7887,[42]5.8273,[43]5.7886,[44]5.8450,[45]5.8481,[46]5.8224,[47]5.8428,[48]5.8164,[49]5.8186,[50]5.7792,[51]5.7755,[52]5.7657,[53]5.8109,[54]5.7964,[55]5.7747,[56]5.8023,[57]5.8232,[58]5.8428,[59]5.8607,[60]5.9020,[61]5.8953,[62]5.9527,[63]5.9840,[64]5.9978,[65]6.0403,[66]6.0480,[67]6.0658,[68]6.0795,[69]6.1037,[70]6.1335,[71]6.1559,[72]6.1870,[73]6.2448,[74]6.2483,[75]6.2627,[76]6.2745,[77]6.2864,[78]6.2724,[79]6.3003,[80]6.2942,[81]6.3078,[82]6.3123,[83]6.2612,[84]6.2434,[85]6.2310,[86]6.2091,[87]6.1446,[88]6.1200,[89]6.1001,[90]6.0861,[91]6.1102,[92]6.1045,[93]6.1043,[94]6.1014,[95]6.1292,[96]6.1288,[97]6.1234,[98]6.1171,[99]6.1039,[100]6.1026,[101]6.1260,[102]6.1220,[103]6.1422,[104]6.1490,[105]6.1488,[106]6.1662,[107]6.1657,[108]6.1787,[109]6.1732,[110]6.1700,[111]6.1917,[112]6.2121,[113]6.2146,[114]6.2101,[115]6.2159,[116]6.2056,[117]6.2103,[118]6.2398,[119]6.2614,[120]6.2956,[121]6.3101,[122]6.3337,[123]6.3701,[124]6.3873,[125]6.3786,[126]6.4164,[127]6.4521,[128]6.4821,[129]6.4672,[130]6.4757,[131]6.4718,[132]6.4630,[133]6.4508,[134]6.4598,[135]6.4561,[136]6.4451,[137]6.4376,[138]6.4205,[139]6.4098,[140]6.4064,[141]6.3775,[142]6.3740,[143]6.3440,[144]6.3233,[145]6.3139,[146]6.3020,[147]6.3048,[148]6.3045,[149]6.2989,[150]6.2941,[151]6.2961,[152]6.2859,[153]6.2701,[154]6.2611,[155]6.2679,[156]6.2632,[157]6.2792,[158]6.2835,[159]6.2884,[160]6.2909,[161]6.3036,[162]6.2761,[163]6.2647,[164]6.2420,[165]6.2117,[166]6.1852,[167]6.1488,[168]6.1189,[169]6.1056,[170]6.0951,[171]6.0693,[172]6.0527,[173]6.0368,[174]6.0077,[175]5.9864,[176]5.9749,[177]5.9553,[178]5.9332,[179]5.9165,[180]5.9070,[181]5.8855,[182]5.8680,[183]5.8547,[184]5.8541,[185]5.8471,[186]5.8478,[187]5.8534,[188]5.8494,[189]5.8663,[190]5.8672,[191]5.8874,[192]5.9032,[193]5.9191,[194]5.9298,[195]5.9514,[196]5.9668,[197]5.9877,[198]6.0027,[199]6.0056,[200]6.0104,[201]6.0051,[202]6.0232,[203]6.0304,[204]6.0287,[205]6.0390,[206]6.0462,[207]6.0426,[208]6.0506,[209]6.0543,[210]6.0596,[211]6.0700,[212]6.0769,[213]6.0873,[214]6.0898,[215]6.0925,[216]6.1063,[217]6.1243,[218]6.1372,[219]6.1368,[220]6.1330,[221]6.1274,[222]6.1253,[223]6.1157,[224]6.1089,[225]6.1052,[226]6.1252,[227]6.1332,[228]6.1387,[229]6.1447,[230]6.1416,[231]6.1583,[232]6.1464,[233]6.1301,[234]6.1153,[235]6.0955,[236]6.0891,[237]6.0797,[238]6.0823,[239]6.0676,[240]6.0576,[241]6.0593,[242]6.0630,[243]6.0612,[244]6.0501,[245]6.0469,[246]6.0357,[247]6.0245,[248]6.0174,[249]6.0149,[250]6.0194,[251]6.0127,[252]6.0091,[253]5.9995,[254]5.9941,[255]5.9830,[256]5.9653,[257]5.9534,[258]5.9457,[259]5.9432,[260]5.9354,[261]5.9313,[262]5.9261,[263]5.9209,[264]5.8991,[265]5.8985,[266]5.8963,[267]5.8899,[268]5.8988,[269]5.8969,[270]5.8974,[271]5.9052,[272]5.9085,[273]5.9088,[274]5.9112,[275]5.9192,[276]5.9254,[277]5.9410,[278]5.9508,[279]5.9598,[280]5.9624,[281]5.9722,[282]5.9780,[283]5.9927,[284]6.0004,[285]6.0087,[286]6.0218,[287]6.0211,[288]6.0267,[289]6.0185,[290]6.0030,[291]5.9883,[292]5.9739,[293]5.9609,[294]5.9629,[295]5.9619,[296]5.9666,[297]5.9652,[298]5.9680,[299]5.9656,[300]5.9551,[301]5.9552,[302]5.9477,[303]5.9390,[304]5.9306,[305]5.9271,[306]5.9146,[307]5.9170,[308]5.9200,[309]5.9045,[310]5.8993,[311]5.8931,[312]5.8954,[313]5.8900,[314]5.8883,[315]5.8731,[316]5.8680,[317]5.8523,[318]5.8324,[319]5.8440,[320]5.8560,[321]5.8602,[322]5.8562,[323]5.8497,[324]5.8470,[325]5.8572,[326]5.8572,[327]5.8595,[328]5.8633,[329]5.8690,[330]5.8718,[331]5.8836,[332]5.8808,[333]5.8874,[334]5.8822,[335]5.8763,[336]5.8801,[337]5.8777,[338]5.8769,[339]5.8718,[340]5.8677,[341]5.8756,[342]5.8786,[343]5.8832,[344]5.8834,[345]5.8837,[346]5.8812,[347]5.8851,[348]5.8883,[349]5.8905,[350]5.8873,[351]5.8881,[352]5.8884,[353]5.8827,[354]5.8831,[355]5.8882,[356]5.8912,[357]5.8877,[358]5.8967,[359]5.8994,[360]5.8959,[361]5.8954,[362]5.9023,[363]5.9135,[364]5.9194,[365]5.9243,[366]5.9256,[367]5.9341,[368]5.9317,[369]5.9326,[370]5.9342,[371]5.9290,[372]5.9336,[373]5.9381,[374]5.9366,[375]5.9368,[376]5.9433,[377]5.9389,[378]5.9416,[379]5.9473,[380]5.9395,[381]5.9361,[382]5.9314,[383]5.9308,[384]5.9304,[385]5.9293,[386]5.9290,[387]5.9288,[388]5.9252,[389]5.9201,[390]5.9134,[391]5.9059,[392]5.9018,[393]5.9004,[394]5.9029,[395]5.9016,[396]5.8946,[397]5.9016,[398]5.9053,[399]5.9129,[400]5.9131,[401]5.9146,[402]5.9158,[403]5.9176,[404]5.9238,[405]5.9143,[406]5.9112,[407]5.9105,[408]5.9121,[409]5.9233,[410]5.9344,[411]5.9455,[412]5.9610,[413]5.9716,[414]5.9790,[415]5.9843,[416]5.9918,[417]6.0035,[418]6.0069,[419]6.0136,[420]6.0222,[421]6.0337,[422]6.0376,[423]6.0445,[424]6.0550,[425]6.0634,[426]6.0697,[427]6.0739,[428]6.0821,[429]6.0871,[430]6.0952,[431]6.1090,[432]6.1126,[433]6.1119,[434]6.1079,[435]6.1090,[436]6.1115,[437]6.1211,[438]6.1284,[439]6.1254,[440]6.1246,[441]6.1199,[442]6.1185,[443]6.1197,[444]6.1202,[445]6.1184,[446]6.1208,[447]6.1238,[448]6.1280,[449]6.1256,[450]6.1265,[451]6.1228,[452]6.1093,[453]6.1006,[454]6.0949,[455]6.0958,[456]6.1004,[457]6.1024,[458]6.1000,[459]6.1005,[460]6.1089,[461]6.1062,[462]6.1049,[463]6.1089,[464]6.1079,[465]6.1052,[466]6.0977,[467]6.0981,[468]6.0979,[469]6.0999,[470]6.1005,[471]6.0958,[472]6.1001,[473]6.0948,[474]6.0960,[475]6.0902,[476]6.0920,[477]6.0848,[478]6.0837,[479]6.0895,[480]6.0941,[481]6.0959,[482]6.0915,[483]6.0873,[484]6.0891,[485]6.0871,[486]6.0815,[487]6.0812,[488]6.0790,[489]6.0743,[490]6.0720,[491]6.0692,[492]6.0636,[493]6.0608,[494]6.0590,[495]6.0584,[496]6.0547,[497]6.0491,[498]6.0474,[499]6.0433,[500]6.0340,[501]6.0274,[502]6.0276,[503]6.0270,[504]6.0184,[505]6.0206,[506]6.0214,[507]6.0157,[508]6.0117,[509]6.0112,[510]6.0145,[511]6.0192,[512]6.0226,[513]6.0245,[514]6.0305,[515]6.0252,[516]6.0243,[517]6.0253,[518]6.0248,[519]6.0278,[520]6.0301,[521]6.0312,[522]6.0338,[523]6.0343,[524]6.0400,[525]6.0431,[526]6.0440,[527]6.0455,[528]6.0406,[529]6.0411,[530]6.0362,[531]6.0350,[532]6.0395,[533]6.0417,[534]6.0399,[535]6.0421,[536]6.0369,[537]6.0349,[538]6.0398,[539]6.0409,[540]6.0446,[541]6.0449,[542]6.0459,[543]6.0475,[544]6.0486,[545]6.0468,[546]6.0478,[547]6.0437,[548]6.0391,[549]6.0390,[550]6.0361,[551]6.0327,[552]6.0306,[553]6.0271,[554]6.0251,[555]6.0221,[556]6.0218,[557]6.0242,[558]6.0206,[559]6.0204,[560]6.0202,[561]6.0205,[562]6.0183,[563]6.0180,[564]6.0224,[565]6.0244,[566]6.0242,[567]6.0220,[568]6.0226,[569]6.0212,[570]6.0240,[571]6.0245,[572]6.0253,[573]6.0253,[574]6.0218,[575]6.0213,[576]6.0213,[577]6.0196,[578]6.0177,[579]6.0181,[580]6.0117,[581]6.0080,[582]6.0070,[583]6.0079,[584]6.0081,[585]6.0007,[586]5.9940,[587]5.9947,[588]5.9994,[589]6.0049,[590]6.0079,[591]6.0100,[592]6.0089,[593]6.0056,[594]6.0066,[595]6.0042,[596]6.0076,[597]6.0054,[598]6.0028,[599]6.0048,[600]6.0044,[601]6.0030,[602]6.0040,[603]6.0069,[604]6.0078,[605]6.0111,[606]6.0132,[607]6.0116,[608]6.0082,[609]6.0091,[610]6.0127,[611]6.0111,[612]6.0137,[613]6.0101,[614]6.0053,[615]5.9983,[616]6.0008,[617]5.9949,[618]5.9903,[619]5.9850,[620]5.9717,[621]5.9650,[622]5.9634,[623]5.9650,[624]5.9655,[625]5.9658,[626]5.9647,[627]5.9670,[628]5.9672,[629]5.9668,[630]5.9699,[631]5.9754,[632]5.9810,[633]5.9795,[634]5.9829,[635]5.9834,[636]5.9800,[637]5.9767,[638]5.9791,[639]5.9760,[640]5.9770,[641]5.9771,[642]5.9836,[643]5.9857,[644]5.9869,[645]5.9851,[646]5.9890,[647]5.9850,[648]5.9860,[649]5.9862,[650]5.9901,[651]5.9952,[652]5.9963,[653]6.0002,[654]5.9940,[655]5.9934,
llama_print_timings:        load time =  6541.18 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 2917328.96 ms / 335360 tokens (    8.70 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 2951478.31 ms

TODO:

cuBLAS perplexity
dot scalar
dot ARM
dot AVX

ggml.c

sw · 2023-04-26T19:46:54Z

_mm256_shuffle_epi8 is indeed a bit faster than the 256-entry lookup table, so maybe you don't want to keep the preprocessor-generated tables; and a uint32_t qh would be more convenient for AVX2 than uint8_t qh[4]. But we should probably keep it consistent between Q5_0 and Q5_1.

ggerganov · 2023-04-26T19:55:55Z

uint32_t qh would be more convenient for AVX2

If it is just for convenience, then lets keep uint8_t qh[4]; for consistency (unless AVX2 becomes slower)

Regarding the tables - we still need the table_b2b_u table for ARM NEON

ggerganov added high priority Very important issue generation quality Quality of model output labels Apr 26, 2023

ikawrakow reviewed Apr 26, 2023

View reviewed changes

ggml.c Show resolved Hide resolved

ggerganov force-pushed the q5_0 branch from 3839a9a to 437edb8 Compare April 26, 2023 15:38

ggerganov changed the title ~~ggml : add Q5_0 quantization~~ ggml : add Q5_0 and Q5_1 quantization Apr 26, 2023

prusnak mentioned this pull request Apr 26, 2023

quantize : use map to assign quantization type from string #1191

Merged

ggerganov added 9 commits April 26, 2023 20:15

ggml : add Q5_0 quantization (cuBLAS only)

5bebc0a

ggml : fix Q5_0 qh -> uint32_t

2576c16

ggml : fix q5_0 histogram stats

99238e4

ggml : q5_0 scalar dot product

ef8e3ee

ggml : q5_0 ARM NEON dot

b294b7f

ggml : q5_0 more efficient ARM NEON using uint64_t masks

d390f4f

ggml : rename Q5_0 -> Q5_1

b9c4358

ggml : adding Q5_0 mode

8e936ad

quantize : add Q5_0 and Q5_1 to map

982bfce

ggerganov force-pushed the q5_0 branch from 437edb8 to 982bfce Compare April 26, 2023 17:45

ggerganov marked this pull request as ready for review April 26, 2023 17:50

ggerganov mentioned this pull request Apr 26, 2023

Investigate alternative approach for Q4 quantization #397

Closed

sw mentioned this pull request Apr 26, 2023

AVX2 optimizations for Q5_0, Q5_1 #1195

Merged

ggml : AVX2 optimizations for Q5_0, Q5_1 (#1195)

2bfa1fe

ggerganov merged commit 574406d into master Apr 26, 2023

ggerganov deleted the q5_0 branch April 26, 2023 20:14

sw mentioned this pull request Apr 26, 2023

Q5: Slightly faster AVX2 implementation #1197

Merged

mofosyne added Tensor Encoding Scheme https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes Review Complexity : High Generally require indepth knowledge of LLMs or GPUs labels May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : add Q5_0 and Q5_1 quantization #1187

ggml : add Q5_0 and Q5_1 quantization #1187

ggerganov commented Apr 26, 2023 •

edited

Loading

sw commented Apr 26, 2023 •

edited

Loading

ggerganov commented Apr 26, 2023

ggml : add Q5_0 and Q5_1 quantization #1187

ggml : add Q5_0 and Q5_1 quantization #1187

Conversation

ggerganov commented Apr 26, 2023 • edited Loading

Q5_0

Q5_1

sw commented Apr 26, 2023 • edited Loading

ggerganov commented Apr 26, 2023

ggerganov commented Apr 26, 2023 •

edited

Loading

sw commented Apr 26, 2023 •

edited

Loading