Add support for StarCoder2 #5795

pacman100 · 2024-02-29T12:02:35Z

What does this PR do?

Adds support for StarCoder 2 models that were released recently.

llama.cpp

gguf-py/gguf/constants.py

pacman100 · 2024-03-01T09:55:38Z

This is working but the output generations are not good, it would be great to have some support in terms of where to look in terms of fixing this.

Convert to gguf format:
Command:

cd llama.cpp
python convert-hf-to-gguf.py ../starcoder2-3b/ --outfile models/starcoder2-3b.gguf --outtype "f32"

Output

Loading model: starcoder2-3b
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
gguf: Adding 48872 merge(s).
gguf: Setting special token type bos to 0
gguf: Setting special token type eos to 0
gguf: Setting special token type unk to 0
Exporting model to 'models/starcoder2-3b.gguf'
gguf: loading model part 'model.safetensors'
token_embd.weight, n_dims = 2, torch.float32 --> float32
blk.0.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.0.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.0.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.0.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.0.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.0.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.0.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.0.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.0.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.0.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.0.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.0.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.0.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.0.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.0.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.0.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.1.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.1.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.1.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.1.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.1.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.1.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.1.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.1.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.1.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.1.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.1.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.1.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.1.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.1.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.1.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.1.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.10.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.10.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.10.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.10.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.10.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.10.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.10.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.10.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.10.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.10.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.10.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.10.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.10.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.10.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.10.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.10.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.11.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.11.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.11.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.11.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.11.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.11.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.11.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.11.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.11.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.11.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.11.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.11.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.11.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.11.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.11.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.11.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.12.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.12.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.12.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.12.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.12.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.12.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.12.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.12.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.12.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.12.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.12.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.12.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.12.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.12.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.12.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.12.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.13.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.13.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.13.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.13.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.13.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.13.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.13.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.13.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.13.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.13.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.13.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.13.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.13.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.13.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.13.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.13.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.14.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.14.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.14.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.14.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.14.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.14.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.14.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.14.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.14.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.14.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.14.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.14.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.14.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.14.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.14.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.14.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.15.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.15.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.15.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.15.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.15.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.15.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.15.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.15.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.15.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.15.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.15.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.15.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.15.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.15.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.15.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.15.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.16.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.16.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.16.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.16.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.16.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.16.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.16.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.16.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.16.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.16.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.16.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.16.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.16.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.16.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.16.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.16.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.17.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.17.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.17.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.17.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.17.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.17.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.17.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.17.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.17.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.17.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.17.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.17.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.17.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.17.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.17.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.17.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.18.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.18.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.18.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.18.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.18.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.18.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.18.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.18.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.18.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.18.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.18.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.18.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.18.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.18.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.18.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.18.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.19.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.19.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.19.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.19.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.19.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.19.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.19.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.19.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.19.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.19.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.19.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.19.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.19.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.19.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.19.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.19.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.2.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.2.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.2.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.2.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.2.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.2.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.2.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.2.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.2.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.2.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.2.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.2.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.2.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.2.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.2.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.2.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.20.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.20.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.20.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.20.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.20.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.20.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.20.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.20.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.20.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.20.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.20.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.20.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.20.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.20.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.20.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.20.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.21.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.21.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.21.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.21.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.21.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.21.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.21.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.21.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.21.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.21.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.21.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.21.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.21.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.21.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.21.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.21.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.22.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.22.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.22.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.22.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.22.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.22.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.22.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.22.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.22.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.22.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.22.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.22.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.22.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.22.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.22.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.22.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.23.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.23.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.23.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.23.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.23.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.23.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.23.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.23.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.23.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.23.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.23.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.23.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.23.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.23.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.23.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.23.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.24.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.24.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.24.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.24.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.24.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.24.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.24.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.24.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.24.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.24.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.24.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.24.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.24.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.24.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.24.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.24.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.25.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.25.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.25.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.25.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.25.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.25.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.25.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.25.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.25.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.25.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.25.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.25.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.25.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.25.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.25.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.25.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.26.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.26.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.26.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.26.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.26.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.26.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.26.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.26.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.26.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.26.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.26.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.26.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.26.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.26.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.26.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.26.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.27.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.27.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.27.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.27.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.27.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.27.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.27.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.27.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.27.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.27.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.27.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.27.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.27.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.27.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.27.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.27.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.28.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.28.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.28.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.28.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.28.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.28.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.28.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.28.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.28.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.28.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.28.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.28.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.28.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.28.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.28.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.28.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.29.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.29.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.29.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.29.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.29.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.29.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.29.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.29.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.29.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.29.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.29.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.29.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.29.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.29.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.29.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.29.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.3.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.3.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.3.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.3.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.3.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.3.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.3.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.3.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.3.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.3.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.3.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.3.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.3.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.3.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.3.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.3.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.4.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.4.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.4.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.4.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.4.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.4.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.4.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.4.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.4.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.4.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.4.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.4.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.4.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.4.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.4.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.4.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.5.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.5.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.5.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.5.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.5.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.5.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.5.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.5.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.5.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.5.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.5.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.5.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.5.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.5.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.5.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.5.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.6.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.6.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.6.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.6.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.6.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.6.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.6.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.6.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.6.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.6.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.6.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.6.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.6.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.6.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.6.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.6.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.7.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.7.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.7.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.7.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.7.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.7.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.7.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.7.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.7.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.7.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.7.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.7.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.7.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.7.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.7.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.7.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.8.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.8.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.8.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.8.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.8.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.8.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.8.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.8.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.8.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.8.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.8.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.8.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.8.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.8.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.8.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.8.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.9.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.9.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.9.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.9.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.9.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.9.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.9.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.9.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.9.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.9.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.9.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.9.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.9.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.9.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.9.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.9.attn_v.weight, n_dims = 2, torch.float32 --> float32
output_norm.bias, n_dims = 1, torch.float32 --> float32
output_norm.weight, n_dims = 1, torch.float32 --> float32
Model successfully exported to 'models/starcoder2-3b.gguf'

Quantization
Command:

./quantize models/starcoder2-3b.gguf models/starcoder2-3b-Q4_K_M.gguf Q4_K_M

Output

main: build = 2299 (d62ce1c6)
main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
main: quantizing 'models/starcoder2-3b.gguf' to 'models/starcoder2-3b-Q4_K_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 17 key-value pairs and 483 tensors from models/starcoder2-3b.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = starcoder2
llama_model_loader: - kv   1:                               general.name str              = starcoder2-3b
llama_model_loader: - kv   2:                     starcoder2.block_count u32              = 30
llama_model_loader: - kv   3:                  starcoder2.context_length u32              = 16384
llama_model_loader: - kv   4:                starcoder2.embedding_length u32              = 3072
llama_model_loader: - kv   5:             starcoder2.feed_forward_length u32              = 12288
llama_model_loader: - kv   6:            starcoder2.attention.head_count u32              = 24
llama_model_loader: - kv   7:         starcoder2.attention.head_count_kv u32              = 2
llama_model_loader: - kv   8:    starcoder2.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv   9:                          general.file_type u32              = 0
llama_model_loader: - kv  10:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  11:                      tokenizer.ggml.tokens arr[str,49152]   = ["<|endoftext|>", "<fim_prefix>", "<f...
llama_model_loader: - kv  12:                  tokenizer.ggml.token_type arr[i32,49152]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  13:                      tokenizer.ggml.merges arr[str,48872]   = ["Ġ Ġ", "ĠĠ ĠĠ", "ĠĠĠĠ ĠĠ...
llama_model_loader: - kv  14:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  15:                tokenizer.ggml.eos_token_id u32              = 0
llama_model_loader: - kv  16:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - type  f32:  483 tensors
llama_model_quantize_internal ============ Strange model: n_attention_wv = 30, n_ffn_down = 60, hparams.n_layer = 30
llama_model_quantize_internal: meta size = 1745952 bytes
[   1/ 483]                    token_embd.weight - [ 3072, 49152,     1,     1], type =    f32, quantizing to q4_K .. size =   576.00 MiB ->    81.00 MiB
[   2/ 483]                 blk.0.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[   3/ 483]               blk.0.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[   4/ 483]                    blk.0.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[   5/ 483]                  blk.0.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[   6/ 483]                  blk.0.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[   7/ 483]                blk.0.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[   8/ 483]                  blk.0.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[   9/ 483]                blk.0.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  10/ 483]                    blk.0.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  11/ 483]                  blk.0.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  12/ 483]               blk.0.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  13/ 483]             blk.0.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  14/ 483]                    blk.0.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  15/ 483]                  blk.0.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  16/ 483]                    blk.0.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  17/ 483]                  blk.0.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[  18/ 483]                 blk.1.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  19/ 483]               blk.1.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  20/ 483]                    blk.1.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[  21/ 483]                  blk.1.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[  22/ 483]                  blk.1.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  23/ 483]                blk.1.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[  24/ 483]                  blk.1.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  25/ 483]                blk.1.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  26/ 483]                    blk.1.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  27/ 483]                  blk.1.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  28/ 483]               blk.1.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  29/ 483]             blk.1.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  30/ 483]                    blk.1.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  31/ 483]                  blk.1.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  32/ 483]                    blk.1.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  33/ 483]                  blk.1.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[  34/ 483]                blk.10.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  35/ 483]              blk.10.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  36/ 483]                   blk.10.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[  37/ 483]                 blk.10.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[  38/ 483]                 blk.10.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  39/ 483]               blk.10.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[  40/ 483]                 blk.10.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  41/ 483]               blk.10.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  42/ 483]                   blk.10.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  43/ 483]                 blk.10.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  44/ 483]              blk.10.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  45/ 483]            blk.10.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  46/ 483]                   blk.10.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  47/ 483]                 blk.10.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  48/ 483]                   blk.10.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  49/ 483]                 blk.10.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[  50/ 483]                blk.11.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  51/ 483]              blk.11.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  52/ 483]                   blk.11.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[  53/ 483]                 blk.11.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[  54/ 483]                 blk.11.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  55/ 483]               blk.11.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[  56/ 483]                 blk.11.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  57/ 483]               blk.11.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  58/ 483]                   blk.11.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  59/ 483]                 blk.11.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  60/ 483]              blk.11.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  61/ 483]            blk.11.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  62/ 483]                   blk.11.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  63/ 483]                 blk.11.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  64/ 483]                   blk.11.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  65/ 483]                 blk.11.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  66/ 483]                blk.12.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  67/ 483]              blk.12.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  68/ 483]                   blk.12.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[  69/ 483]                 blk.12.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[  70/ 483]                 blk.12.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  71/ 483]               blk.12.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[  72/ 483]                 blk.12.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  73/ 483]               blk.12.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  74/ 483]                   blk.12.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  75/ 483]                 blk.12.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  76/ 483]              blk.12.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  77/ 483]            blk.12.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  78/ 483]                   blk.12.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  79/ 483]                 blk.12.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  80/ 483]                   blk.12.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  81/ 483]                 blk.12.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  82/ 483]                blk.13.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  83/ 483]              blk.13.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  84/ 483]                   blk.13.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[  85/ 483]                 blk.13.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[  86/ 483]                 blk.13.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  87/ 483]               blk.13.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[  88/ 483]                 blk.13.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  89/ 483]               blk.13.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  90/ 483]                   blk.13.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  91/ 483]                 blk.13.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  92/ 483]              blk.13.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  93/ 483]            blk.13.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  94/ 483]                   blk.13.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  95/ 483]                 blk.13.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  96/ 483]                   blk.13.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  97/ 483]                 blk.13.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[  98/ 483]                blk.14.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  99/ 483]              blk.14.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 100/ 483]                   blk.14.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 101/ 483]                 blk.14.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 102/ 483]                 blk.14.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 103/ 483]               blk.14.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 104/ 483]                 blk.14.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 105/ 483]               blk.14.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 106/ 483]                   blk.14.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 107/ 483]                 blk.14.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 108/ 483]              blk.14.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 109/ 483]            blk.14.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 110/ 483]                   blk.14.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 111/ 483]                 blk.14.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 112/ 483]                   blk.14.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 113/ 483]                 blk.14.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 114/ 483]                blk.15.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 115/ 483]              blk.15.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 116/ 483]                   blk.15.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 117/ 483]                 blk.15.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 118/ 483]                 blk.15.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 119/ 483]               blk.15.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 120/ 483]                 blk.15.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 121/ 483]               blk.15.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 122/ 483]                   blk.15.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 123/ 483]                 blk.15.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 124/ 483]              blk.15.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 125/ 483]            blk.15.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 126/ 483]                   blk.15.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 127/ 483]                 blk.15.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 128/ 483]                   blk.15.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 129/ 483]                 blk.15.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 130/ 483]                blk.16.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 131/ 483]              blk.16.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 132/ 483]                   blk.16.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 133/ 483]                 blk.16.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 134/ 483]                 blk.16.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 135/ 483]               blk.16.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 136/ 483]                 blk.16.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 137/ 483]               blk.16.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 138/ 483]                   blk.16.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 139/ 483]                 blk.16.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 140/ 483]              blk.16.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 141/ 483]            blk.16.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 142/ 483]                   blk.16.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 143/ 483]                 blk.16.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 144/ 483]                   blk.16.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 145/ 483]                 blk.16.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 146/ 483]                blk.17.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 147/ 483]              blk.17.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 148/ 483]                   blk.17.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 149/ 483]                 blk.17.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 150/ 483]                 blk.17.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 151/ 483]               blk.17.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 152/ 483]                 blk.17.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 153/ 483]               blk.17.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 154/ 483]                   blk.17.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 155/ 483]                 blk.17.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 156/ 483]              blk.17.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 157/ 483]            blk.17.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 158/ 483]                   blk.17.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 159/ 483]                 blk.17.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 160/ 483]                   blk.17.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 161/ 483]                 blk.17.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 162/ 483]                blk.18.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 163/ 483]              blk.18.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 164/ 483]                   blk.18.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 165/ 483]                 blk.18.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 166/ 483]                 blk.18.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 167/ 483]               blk.18.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 168/ 483]                 blk.18.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 169/ 483]               blk.18.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 170/ 483]                   blk.18.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 171/ 483]                 blk.18.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 172/ 483]              blk.18.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 173/ 483]            blk.18.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 174/ 483]                   blk.18.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 175/ 483]                 blk.18.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 176/ 483]                   blk.18.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 177/ 483]                 blk.18.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 178/ 483]                blk.19.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 179/ 483]              blk.19.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 180/ 483]                   blk.19.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 181/ 483]                 blk.19.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 182/ 483]                 blk.19.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 183/ 483]               blk.19.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 184/ 483]                 blk.19.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 185/ 483]               blk.19.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 186/ 483]                   blk.19.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 187/ 483]                 blk.19.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 188/ 483]              blk.19.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 189/ 483]            blk.19.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 190/ 483]                   blk.19.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 191/ 483]                 blk.19.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 192/ 483]                   blk.19.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 193/ 483]                 blk.19.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 194/ 483]                 blk.2.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 195/ 483]               blk.2.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 196/ 483]                    blk.2.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 197/ 483]                  blk.2.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 198/ 483]                  blk.2.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 199/ 483]                blk.2.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 200/ 483]                  blk.2.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 201/ 483]                blk.2.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 202/ 483]                    blk.2.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 203/ 483]                  blk.2.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 204/ 483]               blk.2.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 205/ 483]             blk.2.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 206/ 483]                    blk.2.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 207/ 483]                  blk.2.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 208/ 483]                    blk.2.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 209/ 483]                  blk.2.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 210/ 483]                blk.20.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 211/ 483]              blk.20.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 212/ 483]                   blk.20.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 213/ 483]                 blk.20.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 214/ 483]                 blk.20.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 215/ 483]               blk.20.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 216/ 483]                 blk.20.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 217/ 483]               blk.20.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 218/ 483]                   blk.20.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 219/ 483]                 blk.20.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 220/ 483]              blk.20.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 221/ 483]            blk.20.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 222/ 483]                   blk.20.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 223/ 483]                 blk.20.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 224/ 483]                   blk.20.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 225/ 483]                 blk.20.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 226/ 483]                blk.21.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 227/ 483]              blk.21.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 228/ 483]                   blk.21.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 229/ 483]                 blk.21.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 230/ 483]                 blk.21.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 231/ 483]               blk.21.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 232/ 483]                 blk.21.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 233/ 483]               blk.21.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 234/ 483]                   blk.21.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 235/ 483]                 blk.21.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 236/ 483]              blk.21.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 237/ 483]            blk.21.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 238/ 483]                   blk.21.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 239/ 483]                 blk.21.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 240/ 483]                   blk.21.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 241/ 483]                 blk.21.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 242/ 483]                blk.22.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 243/ 483]              blk.22.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 244/ 483]                   blk.22.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 245/ 483]                 blk.22.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 246/ 483]                 blk.22.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 247/ 483]               blk.22.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 248/ 483]                 blk.22.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 249/ 483]               blk.22.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 250/ 483]                   blk.22.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 251/ 483]                 blk.22.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 252/ 483]              blk.22.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 253/ 483]            blk.22.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 254/ 483]                   blk.22.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 255/ 483]                 blk.22.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 256/ 483]                   blk.22.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 257/ 483]                 blk.22.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 258/ 483]                blk.23.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 259/ 483]              blk.23.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 260/ 483]                   blk.23.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 261/ 483]                 blk.23.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 262/ 483]                 blk.23.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 263/ 483]               blk.23.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 264/ 483]                 blk.23.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 265/ 483]               blk.23.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 266/ 483]                   blk.23.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 267/ 483]                 blk.23.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 268/ 483]              blk.23.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 269/ 483]            blk.23.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 270/ 483]                   blk.23.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 271/ 483]                 blk.23.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 272/ 483]                   blk.23.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 273/ 483]                 blk.23.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 274/ 483]                blk.24.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 275/ 483]              blk.24.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 276/ 483]                   blk.24.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 277/ 483]                 blk.24.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 278/ 483]                 blk.24.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 279/ 483]               blk.24.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 280/ 483]                 blk.24.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 281/ 483]               blk.24.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 282/ 483]                   blk.24.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 283/ 483]                 blk.24.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 284/ 483]              blk.24.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 285/ 483]            blk.24.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 286/ 483]                   blk.24.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 287/ 483]                 blk.24.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 288/ 483]                   blk.24.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 289/ 483]                 blk.24.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 290/ 483]                blk.25.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 291/ 483]              blk.25.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 292/ 483]                   blk.25.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 293/ 483]                 blk.25.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 294/ 483]                 blk.25.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 295/ 483]               blk.25.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 296/ 483]                 blk.25.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 297/ 483]               blk.25.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 298/ 483]                   blk.25.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 299/ 483]                 blk.25.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 300/ 483]              blk.25.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 301/ 483]            blk.25.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 302/ 483]                   blk.25.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 303/ 483]                 blk.25.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 304/ 483]                   blk.25.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 305/ 483]                 blk.25.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 306/ 483]                blk.26.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 307/ 483]              blk.26.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 308/ 483]                   blk.26.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 309/ 483]                 blk.26.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 310/ 483]                 blk.26.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 311/ 483]               blk.26.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 312/ 483]                 blk.26.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 313/ 483]               blk.26.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 314/ 483]                   blk.26.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 315/ 483]                 blk.26.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 316/ 483]              blk.26.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 317/ 483]            blk.26.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 318/ 483]                   blk.26.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 319/ 483]                 blk.26.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 320/ 483]                   blk.26.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 321/ 483]                 blk.26.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 322/ 483]                blk.27.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 323/ 483]              blk.27.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 324/ 483]                   blk.27.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 325/ 483]                 blk.27.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 326/ 483]                 blk.27.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 327/ 483]               blk.27.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 328/ 483]                 blk.27.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 329/ 483]               blk.27.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 330/ 483]                   blk.27.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 331/ 483]                 blk.27.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 332/ 483]              blk.27.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 333/ 483]            blk.27.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 334/ 483]                   blk.27.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 335/ 483]                 blk.27.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 336/ 483]                   blk.27.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 337/ 483]                 blk.27.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 338/ 483]                blk.28.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 339/ 483]              blk.28.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 340/ 483]                   blk.28.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 341/ 483]                 blk.28.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 342/ 483]                 blk.28.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 343/ 483]               blk.28.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 344/ 483]                 blk.28.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 345/ 483]               blk.28.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 346/ 483]                   blk.28.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 347/ 483]                 blk.28.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 348/ 483]              blk.28.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 349/ 483]            blk.28.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 350/ 483]                   blk.28.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 351/ 483]                 blk.28.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 352/ 483]                   blk.28.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 353/ 483]                 blk.28.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 354/ 483]                blk.29.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 355/ 483]              blk.29.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 356/ 483]                   blk.29.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 357/ 483]                 blk.29.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 358/ 483]                 blk.29.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 359/ 483]               blk.29.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 360/ 483]                 blk.29.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 361/ 483]               blk.29.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 362/ 483]                   blk.29.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 363/ 483]                 blk.29.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 364/ 483]              blk.29.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 365/ 483]            blk.29.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 366/ 483]                   blk.29.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 367/ 483]                 blk.29.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 368/ 483]                   blk.29.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 369/ 483]                 blk.29.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 370/ 483]                 blk.3.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 371/ 483]               blk.3.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 372/ 483]                    blk.3.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 373/ 483]                  blk.3.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 374/ 483]                  blk.3.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 375/ 483]                blk.3.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 376/ 483]                  blk.3.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 377/ 483]                blk.3.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 378/ 483]                    blk.3.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 379/ 483]                  blk.3.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 380/ 483]               blk.3.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 381/ 483]             blk.3.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 382/ 483]                    blk.3.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 383/ 483]                  blk.3.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 384/ 483]                    blk.3.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 385/ 483]                  blk.3.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 386/ 483]                 blk.4.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 387/ 483]               blk.4.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 388/ 483]                    blk.4.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 389/ 483]                  blk.4.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 390/ 483]                  blk.4.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 391/ 483]                blk.4.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 392/ 483]                  blk.4.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 393/ 483]                blk.4.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 394/ 483]                    blk.4.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 395/ 483]                  blk.4.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 396/ 483]               blk.4.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 397/ 483]             blk.4.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 398/ 483]                    blk.4.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 399/ 483]                  blk.4.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 400/ 483]                    blk.4.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 401/ 483]                  blk.4.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 402/ 483]                 blk.5.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 403/ 483]               blk.5.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 404/ 483]                    blk.5.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 405/ 483]                  blk.5.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 406/ 483]                  blk.5.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 407/ 483]                blk.5.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 408/ 483]                  blk.5.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 409/ 483]                blk.5.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 410/ 483]                    blk.5.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 411/ 483]                  blk.5.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 412/ 483]               blk.5.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 413/ 483]             blk.5.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 414/ 483]                    blk.5.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 415/ 483]                  blk.5.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 416/ 483]                    blk.5.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 417/ 483]                  blk.5.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 418/ 483]                 blk.6.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 419/ 483]               blk.6.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 420/ 483]                    blk.6.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 421/ 483]                  blk.6.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 422/ 483]                  blk.6.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 423/ 483]                blk.6.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 424/ 483]                  blk.6.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 425/ 483]                blk.6.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 426/ 483]                    blk.6.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 427/ 483]                  blk.6.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 428/ 483]               blk.6.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 429/ 483]             blk.6.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 430/ 483]                    blk.6.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 431/ 483]                  blk.6.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 432/ 483]                    blk.6.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 433/ 483]                  blk.6.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 434/ 483]                 blk.7.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 435/ 483]               blk.7.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 436/ 483]                    blk.7.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 437/ 483]                  blk.7.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 438/ 483]                  blk.7.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 439/ 483]                blk.7.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 440/ 483]                  blk.7.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 441/ 483]                blk.7.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 442/ 483]                    blk.7.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 443/ 483]                  blk.7.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 444/ 483]               blk.7.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 445/ 483]             blk.7.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 446/ 483]                    blk.7.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 447/ 483]                  blk.7.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 448/ 483]                    blk.7.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 449/ 483]                  blk.7.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 450/ 483]                 blk.8.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 451/ 483]               blk.8.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 452/ 483]                    blk.8.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 453/ 483]                  blk.8.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 454/ 483]                  blk.8.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 455/ 483]                blk.8.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 456/ 483]                  blk.8.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 457/ 483]                blk.8.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 458/ 483]                    blk.8.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 459/ 483]                  blk.8.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 460/ 483]               blk.8.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 461/ 483]             blk.8.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 462/ 483]                    blk.8.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 463/ 483]                  blk.8.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 464/ 483]                    blk.8.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 465/ 483]                  blk.8.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 466/ 483]                 blk.9.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 467/ 483]               blk.9.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 468/ 483]                    blk.9.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 469/ 483]                  blk.9.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 470/ 483]                  blk.9.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 471/ 483]                blk.9.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 472/ 483]                  blk.9.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 473/ 483]                blk.9.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 474/ 483]                    blk.9.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 475/ 483]                  blk.9.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 476/ 483]               blk.9.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 477/ 483]             blk.9.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 478/ 483]                    blk.9.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 479/ 483]                  blk.9.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 480/ 483]                    blk.9.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 481/ 483]                  blk.9.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 482/ 483]                     output_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 483/ 483]                   output_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
llama_model_quantize_internal: model size  = 11559.95 MB
llama_model_quantize_internal: quant size  =  1761.66 MB

main: quantize time =  6205.08 ms
main:    total time =  6205.08 ms

Test on a sample:
Command:

./main -m models/starcoder2-3b.gguf -p "#python code for efficient implemetation of two_sum\ndef two_sum(arr, target_sum):\n" -n 60 -e

Output:

Log start
main: build = 2299 (d62ce1c6)
main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
main: seed  = 1709286693
llama_model_loader: loaded meta data with 17 key-value pairs and 483 tensors from models/starcoder2-3b.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = starcoder2
llama_model_loader: - kv   1:                               general.name str              = starcoder2-3b
llama_model_loader: - kv   2:                     starcoder2.block_count u32              = 30
llama_model_loader: - kv   3:                  starcoder2.context_length u32              = 16384
llama_model_loader: - kv   4:                starcoder2.embedding_length u32              = 3072
llama_model_loader: - kv   5:             starcoder2.feed_forward_length u32              = 12288
llama_model_loader: - kv   6:            starcoder2.attention.head_count u32              = 24
llama_model_loader: - kv   7:         starcoder2.attention.head_count_kv u32              = 2
llama_model_loader: - kv   8:    starcoder2.attention.layer_norm_epsilon f32              = 0,000010
llama_model_loader: - kv   9:                          general.file_type u32              = 0
llama_model_loader: - kv  10:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  11:                      tokenizer.ggml.tokens arr[str,49152]   = ["<|endoftext|>", "<fim_prefix>", "<f...
llama_model_loader: - kv  12:                  tokenizer.ggml.token_type arr[i32,49152]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  13:                      tokenizer.ggml.merges arr[str,48872]   = ["Ġ Ġ", "ĠĠ ĠĠ", "ĠĠĠĠ ĠĠ...
llama_model_loader: - kv  14:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  15:                tokenizer.ggml.eos_token_id u32              = 0
llama_model_loader: - kv  16:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - type  f32:  483 tensors
llm_load_vocab: special tokens definition check successful ( 38/49152 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = starcoder2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 49152
llm_load_print_meta: n_merges         = 48872
llm_load_print_meta: n_ctx_train      = 16384
llm_load_print_meta: n_embd           = 3072
llm_load_print_meta: n_head           = 24
llm_load_print_meta: n_head_kv        = 2
llm_load_print_meta: n_layer          = 30
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 12
llm_load_print_meta: n_embd_k_gqa     = 256
llm_load_print_meta: n_embd_v_gqa     = 256
llm_load_print_meta: f_norm_eps       = 1,0e-05
llm_load_print_meta: f_norm_rms_eps   = 0,0e+00
llm_load_print_meta: f_clamp_kqv      = 0,0e+00
llm_load_print_meta: f_max_alibi_bias = 0,0e+00
llm_load_print_meta: n_ff             = 12288
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000,0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 16384
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 3B
llm_load_print_meta: model ftype      = all F32
llm_load_print_meta: model params     = 3,03 B
llm_load_print_meta: model size       = 11,29 GiB (32,00 BPW) 
llm_load_print_meta: general.name     = starcoder2-3b
llm_load_print_meta: BOS token        = 0 '<|endoftext|>'
llm_load_print_meta: EOS token        = 0 '<|endoftext|>'
llm_load_print_meta: UNK token        = 0 '<|endoftext|>'
llm_load_print_meta: LF token         = 164 'Ä'
llm_load_tensors: ggml ctx size =    0,18 MiB
llm_load_tensors:        CPU buffer size = 11559,95 MiB
..............................................................................
.
.
.
.
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000,0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =    15,00 MiB
llama_new_context_with_model: KV self size  =   15,00 MiB, K (f16):    7,50 MiB, V (f16):    7,50 MiB
llama_new_context_with_model:        CPU input buffer size   =     8,01 MiB
llama_new_context_with_model:        CPU compute buffer size =   108,00 MiB
llama_new_context_with_model: graph splits (measure): 1

system_info: n_threads = 64 / 128 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | 
sampling: 
	repeat_last_n = 64, repeat_penalty = 1,100, frequency_penalty = 0,000, presence_penalty = 0,000
	top_k = 40, tfs_z = 1,000, top_p = 0,950, min_p = 0,050, typical_p = 1,000, temp = 0,800
	mirostat = 0, mirostat_lr = 0,100, mirostat_ent = 5,000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature 
generate: n_ctx = 512, n_batch = 512, n_predict = 60, n_keep = 0


#python code for efficient implemetation of two_sum
def two_sum(arr, target_sum):

#include <vector>
# #
def
deftwo(a):


defsum(int arr,




#include<stdio(int main()

#include<bits/
#include<stdio.h
#define
intmain()
llama_print_timings:        load time =    1000,83 ms
llama_print_timings:      sample time =      32,65 ms /    60 runs   (    0,54 ms per token,  1837,56 tokens per second)
llama_print_timings: prompt eval time =     363,40 ms /    26 tokens (   13,98 ms per token,    71,55 tokens per second)
llama_print_timings:        eval time =    7341,69 ms /    59 runs   (  124,44 ms per token,     8,04 tokens per second)
llama_print_timings:       total time =    7769,73 ms /    85 tokens
Log end

Whereas, loading transformers model gives correct outputs:

starcoder2 : change rope type to neox

ggerganov

After fixing the rope type, I think this is ready to merge, but would recommend some tests before doing so to make sure that the results make sense now

pacman100 · 2024-03-01T19:02:53Z

but would recommend some tests before doing so to make sure that the results make sense now

I did some generations locally using normal as well as FIM format, and the results are as per expectations:

Samples from 3B model. Commands are quoted and outputs in diff format to show the generations from the model:

./main -m models/starcoder2-3b-Q4_K_M.gguf -p "def print_hello_world():" -n 64 -e --temp 0.2

Output

def print_hello_world():
+    print("Hello World!")

+print_hello_world()

+/python/python_basics.py
+# Python Basics

+# Variables
+x = 5
+y = "John"
+print(type(x)) # Prints "<class 'int'>"
+print(type(y

/main -m models/starcoder2-3b-Q4_K_M.gguf -p "#python code for efficient implemetation of two_sum\ndef two_sum(arr, target_sum):\n" -n 128 -e --temp 0.2

#python code for efficient implemetation of two_sum
def two_sum(arr, target_sum):
+	hash = {} #dictionary to store the values of the elements in the array
+	for i in range(len(arr)):
+		if arr[i] not in hash:
+			hash[target_sum - arr[i]] = 1
+		else:
+			return True
+	return False

+#python code for efficient implemetation of three_sum
+def three_sum(arr, target_sum):
+	hash = {} #dictionary to store the values of the elements in the array
+	for i in range(len(arr)):
+		if arr[i] not

./main -m models/starcoder2-3b-Q4_K_M.gguf -p '<fim_prefix>\ndef _get_model_architecture(self) -> gguf.MODEL_ARCH:\n\tarch = self.hparams["architectures"][0]\n\tif arch == "GPTNeoXForCausalLM":\n\t\treturn gguf.MODEL_ARCH.GPTNEOX\n\tif arch == "BloomForCausalLM":\n\t\treturn gguf.MODEL_ARCH.BLOOM\n\tif arch == "MPTForCausalLM":\n\t\treturn gguf.MODEL_ARCH.MPT\n\tif arch in ("BaichuanForCausalLM", "BaiChuanForCausalLM"):\n\t\treturn gguf.MODEL_ARCH.BAICHUAN\n\tif arch in ("FalconForCausalLM", "RWForCausalLM"):\n\t\treturn gguf.MODEL_ARCH.FALCON\n\tif arch == "GPTBigCodeForCausalLM":\n\t\treturn gguf.MODEL_ARCH.STARCODER\n\tif arch == "GPTRefactForCausalLM":\n\t\treturn gguf.MODEL_ARCH.REFACT\n\tif arch == "PersimmonForCausalLM":\n\t\treturn gguf.MODEL_ARCH.PERSIMMON\n\tif arch in ("StableLmForCausalLM", "StableLMEpochForCausalLM", "LlavaStableLMEpochForCausalLM"):\n\t\treturn gguf.MODEL_ARCH.STABLELM\n\tif arch == "QWenLMHeadModel":\n\t\treturn gguf.MODEL_ARCH.QWEN\n\tif arch == "Qwen2ForCausalLM":\n\t\treturn gguf.MODEL_ARCH.QWEN2\n\tif arch == "MixtralForCausalLM":\n\t\treturn gguf.MODEL_ARCH.LLAMA\n\tif arch == "GPT2LMHeadModel":\n\t\treturn gguf.MODEL_ARCH.GPT2\n\tif arch == "PhiForCausalLM":\n\t\treturn gguf.MODEL_ARCH.PHI2\n\tif arch == "PlamoForCausalLM":\n\t\treturn gguf.MODEL_ARCH.PLAMO\n\tif arch == "CodeShellForCausalLM":\n\t\treturn gguf.MODEL_ARCH.CODESHELL\n\tif arch == "OrionForCausalLM":\n\t\treturn gguf.MODEL_ARCH.ORION\n\tif arch == "InternLM2ForCausalLM":\n\t\treturn gguf.MODEL_ARCH.INTERNLM2\n\tif arch == "MiniCPMForCausalLM":\n\t\treturn gguf.MODEL_ARCH.MINICPM\n\tif arch == "BertModel":\n\t\treturn gguf.MODEL_ARCH.BERT\n\tif arch == "NomicBertModel":\n\t\treturn gguf.MODEL_ARCH.NOMIC_BERT\n\tif arch == "GemmaForCausalLM":\n\t\treturn gguf.MODEL_ARCH.GEMMA\n\tif arch == "Starcoder2ForCausalLM":\n<fim_suffix>\n\t\traise NotImplementedError(f"Architecture "{arch}" not supported!")\n<fim_middle>' -c 2048 -n 16 -e --temp 0.2

Output. Note <fim_prefix>, <fim_suffix> and <fim_middle> were removed by ./main, the highlighted generation is after the <fim_middle> and hence correct:

def _get_model_architecture(self) -> gguf.MODEL_ARCH:
	arch = self.hparams["architectures"][0]
	if arch == "GPTNeoXForCausalLM":
		return gguf.MODEL_ARCH.GPTNEOX
	if arch == "BloomForCausalLM":
		return gguf.MODEL_ARCH.BLOOM
	if arch == "MPTForCausalLM":
		return gguf.MODEL_ARCH.MPT
	if arch in ("BaichuanForCausalLM", "BaiChuanForCausalLM"):
		return gguf.MODEL_ARCH.BAICHUAN
	if arch in ("FalconForCausalLM", "RWForCausalLM"):
		return gguf.MODEL_ARCH.FALCON
	if arch == "GPTBigCodeForCausalLM":
		return gguf.MODEL_ARCH.STARCODER
	if arch == "GPTRefactForCausalLM":
		return gguf.MODEL_ARCH.REFACT
	if arch == "PersimmonForCausalLM":
		return gguf.MODEL_ARCH.PERSIMMON
	if arch in ("StableLmForCausalLM", "StableLMEpochForCausalLM", "LlavaStableLMEpochForCausalLM"):
		return gguf.MODEL_ARCH.STABLELM
	if arch == "QWenLMHeadModel":
		return gguf.MODEL_ARCH.QWEN
	if arch == "Qwen2ForCausalLM":
		return gguf.MODEL_ARCH.QWEN2
	if arch == "MixtralForCausalLM":
		return gguf.MODEL_ARCH.LLAMA
	if arch == "GPT2LMHeadModel":
		return gguf.MODEL_ARCH.GPT2
	if arch == "PhiForCausalLM":
		return gguf.MODEL_ARCH.PHI2
	if arch == "PlamoForCausalLM":
		return gguf.MODEL_ARCH.PLAMO
	if arch == "CodeShellForCausalLM":
		return gguf.MODEL_ARCH.CODESHELL
	if arch == "OrionForCausalLM":
		return gguf.MODEL_ARCH.ORION
	if arch == "InternLM2ForCausalLM":
		return gguf.MODEL_ARCH.INTERNLM2
	if arch == "MiniCPMForCausalLM":
		return gguf.MODEL_ARCH.MINICPM
	if arch == "BertModel":
		return gguf.MODEL_ARCH.BERT
	if arch == "NomicBertModel":
		return gguf.MODEL_ARCH.NOMIC_BERT
	if arch == "GemmaForCausalLM":
		return gguf.MODEL_ARCH.GEMMA
	if arch == "Starcoder2ForCausalLM":

		raise NotImplementedError(f"Architecture "{arch}" not supported!")
+		return gguf.MODEL_ARCH.STARCODER2

pacman100 · 2024-03-01T19:05:17Z

Finally, good to merge this PR @ggerganov. Thank you!

convert-hf-to-gguf.py

* Add support for starcoder2 * handle rope type * skip rope freq and rotary embeddings from being serialized * resolve comments * Update llama.cpp * remove redundant changes * handle `rope-theta` * llama : change starcoder2 rope type * address comment --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

pacman100 added 3 commits February 29, 2024 17:31

Add support for starcoder2

ab4eab3

handle rope type

6c10806

skip rope freq and rotary embeddings from being serialized

d62ce1c

compilade reviewed Feb 29, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

gguf-py/gguf/constants.py Outdated Show resolved Hide resolved

wsxiaoys mentioned this pull request Feb 29, 2024

support starcoder2 TabbyML/tabby#1587

Closed

easp mentioned this pull request Mar 1, 2024

Model request: StarCoder2 ollama/ollama#2817

Closed

pacman100 added 2 commits March 1, 2024 11:09

resolve comments

10aa6e9

Update llama.cpp

5c06625

apepkuss mentioned this pull request Mar 1, 2024

Model list on https://huggingface.co/second-state LlamaEdge/LlamaEdge#71

Open

remove redundant changes

fdd886f

pacman100 marked this pull request as ready for review March 1, 2024 09:57

pacman100 and others added 3 commits March 1, 2024 15:29

handle rope-theta

b67b8f6

llama : change starcoder2 rope type

9862d59

Merge pull request #1 from ggerganov/gg/fix-starcoder2

15f233b

starcoder2 : change rope type to neox

cocoa-xu mentioned this pull request Mar 1, 2024

add support for starcoder2 #5819

Closed

compilade mentioned this pull request Mar 1, 2024

llama : fix segfault from unknown model arch name #5820

Merged

ggerganov approved these changes Mar 1, 2024

View reviewed changes

cebtenzzre reviewed Mar 1, 2024

View reviewed changes

convert-hf-to-gguf.py Show resolved Hide resolved

address comment

ee5b171

cebtenzzre approved these changes Mar 1, 2024

View reviewed changes

ggerganov merged commit c29af7e into ggerganov:master Mar 1, 2024

loubnabnl mentioned this pull request Mar 4, 2024

Official Support for GGUF Quantization in BigCode Starcoder2 to Enhance Accessibility and Efficiency bigcode-project/starcoder2#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for StarCoder2 #5795

Add support for StarCoder2 #5795

pacman100 commented Feb 29, 2024

pacman100 commented Mar 1, 2024 •

edited by cebtenzzre

Loading

ggerganov left a comment

pacman100 commented Mar 1, 2024

pacman100 commented Mar 1, 2024

Add support for StarCoder2 #5795

Add support for StarCoder2 #5795

Conversation

pacman100 commented Feb 29, 2024

What does this PR do?

pacman100 commented Mar 1, 2024 • edited by cebtenzzre Loading

ggerganov left a comment

Choose a reason for hiding this comment

pacman100 commented Mar 1, 2024

pacman100 commented Mar 1, 2024

pacman100 commented Mar 1, 2024 •

edited by cebtenzzre

Loading