Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for StarCoder2 #5795

Merged
merged 10 commits into from
Mar 1, 2024

Conversation

pacman100
Copy link
Contributor

What does this PR do?

  1. Adds support for StarCoder 2 models that were released recently.

llama.cpp Outdated Show resolved Hide resolved
gguf-py/gguf/constants.py Outdated Show resolved Hide resolved
@pacman100
Copy link
Contributor Author

pacman100 commented Mar 1, 2024

This is working but the output generations are not good, it would be great to have some support in terms of where to look in terms of fixing this.

  1. Convert to gguf format:
    Command:
cd llama.cpp
python convert-hf-to-gguf.py ../starcoder2-3b/ --outfile models/starcoder2-3b.gguf --outtype "f32"
Output
Loading model: starcoder2-3b
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
gguf: Adding 48872 merge(s).
gguf: Setting special token type bos to 0
gguf: Setting special token type eos to 0
gguf: Setting special token type unk to 0
Exporting model to 'models/starcoder2-3b.gguf'
gguf: loading model part 'model.safetensors'
token_embd.weight, n_dims = 2, torch.float32 --> float32
blk.0.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.0.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.0.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.0.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.0.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.0.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.0.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.0.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.0.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.0.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.0.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.0.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.0.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.0.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.0.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.0.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.1.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.1.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.1.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.1.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.1.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.1.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.1.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.1.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.1.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.1.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.1.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.1.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.1.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.1.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.1.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.1.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.10.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.10.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.10.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.10.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.10.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.10.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.10.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.10.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.10.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.10.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.10.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.10.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.10.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.10.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.10.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.10.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.11.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.11.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.11.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.11.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.11.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.11.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.11.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.11.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.11.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.11.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.11.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.11.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.11.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.11.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.11.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.11.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.12.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.12.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.12.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.12.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.12.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.12.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.12.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.12.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.12.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.12.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.12.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.12.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.12.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.12.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.12.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.12.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.13.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.13.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.13.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.13.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.13.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.13.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.13.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.13.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.13.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.13.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.13.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.13.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.13.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.13.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.13.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.13.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.14.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.14.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.14.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.14.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.14.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.14.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.14.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.14.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.14.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.14.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.14.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.14.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.14.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.14.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.14.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.14.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.15.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.15.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.15.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.15.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.15.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.15.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.15.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.15.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.15.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.15.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.15.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.15.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.15.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.15.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.15.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.15.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.16.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.16.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.16.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.16.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.16.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.16.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.16.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.16.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.16.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.16.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.16.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.16.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.16.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.16.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.16.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.16.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.17.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.17.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.17.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.17.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.17.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.17.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.17.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.17.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.17.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.17.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.17.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.17.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.17.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.17.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.17.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.17.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.18.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.18.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.18.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.18.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.18.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.18.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.18.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.18.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.18.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.18.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.18.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.18.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.18.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.18.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.18.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.18.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.19.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.19.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.19.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.19.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.19.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.19.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.19.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.19.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.19.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.19.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.19.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.19.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.19.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.19.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.19.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.19.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.2.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.2.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.2.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.2.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.2.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.2.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.2.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.2.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.2.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.2.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.2.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.2.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.2.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.2.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.2.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.2.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.20.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.20.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.20.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.20.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.20.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.20.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.20.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.20.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.20.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.20.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.20.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.20.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.20.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.20.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.20.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.20.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.21.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.21.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.21.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.21.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.21.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.21.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.21.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.21.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.21.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.21.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.21.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.21.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.21.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.21.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.21.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.21.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.22.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.22.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.22.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.22.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.22.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.22.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.22.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.22.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.22.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.22.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.22.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.22.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.22.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.22.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.22.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.22.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.23.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.23.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.23.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.23.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.23.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.23.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.23.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.23.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.23.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.23.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.23.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.23.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.23.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.23.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.23.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.23.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.24.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.24.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.24.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.24.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.24.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.24.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.24.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.24.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.24.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.24.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.24.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.24.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.24.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.24.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.24.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.24.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.25.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.25.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.25.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.25.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.25.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.25.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.25.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.25.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.25.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.25.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.25.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.25.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.25.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.25.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.25.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.25.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.26.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.26.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.26.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.26.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.26.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.26.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.26.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.26.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.26.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.26.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.26.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.26.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.26.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.26.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.26.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.26.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.27.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.27.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.27.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.27.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.27.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.27.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.27.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.27.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.27.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.27.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.27.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.27.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.27.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.27.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.27.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.27.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.28.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.28.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.28.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.28.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.28.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.28.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.28.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.28.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.28.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.28.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.28.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.28.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.28.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.28.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.28.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.28.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.29.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.29.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.29.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.29.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.29.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.29.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.29.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.29.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.29.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.29.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.29.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.29.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.29.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.29.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.29.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.29.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.3.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.3.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.3.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.3.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.3.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.3.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.3.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.3.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.3.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.3.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.3.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.3.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.3.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.3.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.3.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.3.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.4.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.4.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.4.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.4.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.4.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.4.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.4.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.4.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.4.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.4.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.4.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.4.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.4.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.4.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.4.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.4.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.5.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.5.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.5.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.5.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.5.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.5.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.5.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.5.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.5.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.5.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.5.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.5.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.5.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.5.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.5.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.5.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.6.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.6.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.6.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.6.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.6.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.6.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.6.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.6.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.6.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.6.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.6.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.6.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.6.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.6.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.6.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.6.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.7.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.7.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.7.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.7.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.7.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.7.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.7.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.7.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.7.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.7.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.7.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.7.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.7.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.7.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.7.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.7.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.8.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.8.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.8.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.8.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.8.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.8.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.8.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.8.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.8.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.8.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.8.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.8.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.8.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.8.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.8.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.8.attn_v.weight, n_dims = 2, torch.float32 --> float32
blk.9.attn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.9.attn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.9.ffn_up.bias, n_dims = 1, torch.float32 --> float32
blk.9.ffn_up.weight, n_dims = 2, torch.float32 --> float32
blk.9.ffn_down.bias, n_dims = 1, torch.float32 --> float32
blk.9.ffn_down.weight, n_dims = 2, torch.float32 --> float32
blk.9.ffn_norm.bias, n_dims = 1, torch.float32 --> float32
blk.9.ffn_norm.weight, n_dims = 1, torch.float32 --> float32
blk.9.attn_k.bias, n_dims = 1, torch.float32 --> float32
blk.9.attn_k.weight, n_dims = 2, torch.float32 --> float32
blk.9.attn_output.bias, n_dims = 1, torch.float32 --> float32
blk.9.attn_output.weight, n_dims = 2, torch.float32 --> float32
blk.9.attn_q.bias, n_dims = 1, torch.float32 --> float32
blk.9.attn_q.weight, n_dims = 2, torch.float32 --> float32
blk.9.attn_v.bias, n_dims = 1, torch.float32 --> float32
blk.9.attn_v.weight, n_dims = 2, torch.float32 --> float32
output_norm.bias, n_dims = 1, torch.float32 --> float32
output_norm.weight, n_dims = 1, torch.float32 --> float32
Model successfully exported to 'models/starcoder2-3b.gguf'
  1. Quantization
    Command:
./quantize models/starcoder2-3b.gguf models/starcoder2-3b-Q4_K_M.gguf Q4_K_M
Output
main: build = 2299 (d62ce1c6)
main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
main: quantizing 'models/starcoder2-3b.gguf' to 'models/starcoder2-3b-Q4_K_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 17 key-value pairs and 483 tensors from models/starcoder2-3b.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = starcoder2
llama_model_loader: - kv   1:                               general.name str              = starcoder2-3b
llama_model_loader: - kv   2:                     starcoder2.block_count u32              = 30
llama_model_loader: - kv   3:                  starcoder2.context_length u32              = 16384
llama_model_loader: - kv   4:                starcoder2.embedding_length u32              = 3072
llama_model_loader: - kv   5:             starcoder2.feed_forward_length u32              = 12288
llama_model_loader: - kv   6:            starcoder2.attention.head_count u32              = 24
llama_model_loader: - kv   7:         starcoder2.attention.head_count_kv u32              = 2
llama_model_loader: - kv   8:    starcoder2.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv   9:                          general.file_type u32              = 0
llama_model_loader: - kv  10:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  11:                      tokenizer.ggml.tokens arr[str,49152]   = ["<|endoftext|>", "<fim_prefix>", "<f...
llama_model_loader: - kv  12:                  tokenizer.ggml.token_type arr[i32,49152]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  13:                      tokenizer.ggml.merges arr[str,48872]   = ["Ġ Ġ", "ĠĠ ĠĠ", "ĠĠĠĠ ĠĠ...
llama_model_loader: - kv  14:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  15:                tokenizer.ggml.eos_token_id u32              = 0
llama_model_loader: - kv  16:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - type  f32:  483 tensors
llama_model_quantize_internal ============ Strange model: n_attention_wv = 30, n_ffn_down = 60, hparams.n_layer = 30
llama_model_quantize_internal: meta size = 1745952 bytes
[   1/ 483]                    token_embd.weight - [ 3072, 49152,     1,     1], type =    f32, quantizing to q4_K .. size =   576.00 MiB ->    81.00 MiB
[   2/ 483]                 blk.0.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[   3/ 483]               blk.0.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[   4/ 483]                    blk.0.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[   5/ 483]                  blk.0.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[   6/ 483]                  blk.0.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[   7/ 483]                blk.0.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[   8/ 483]                  blk.0.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[   9/ 483]                blk.0.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  10/ 483]                    blk.0.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  11/ 483]                  blk.0.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  12/ 483]               blk.0.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  13/ 483]             blk.0.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  14/ 483]                    blk.0.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  15/ 483]                  blk.0.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  16/ 483]                    blk.0.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  17/ 483]                  blk.0.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[  18/ 483]                 blk.1.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  19/ 483]               blk.1.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  20/ 483]                    blk.1.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[  21/ 483]                  blk.1.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[  22/ 483]                  blk.1.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  23/ 483]                blk.1.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[  24/ 483]                  blk.1.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  25/ 483]                blk.1.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  26/ 483]                    blk.1.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  27/ 483]                  blk.1.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  28/ 483]               blk.1.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  29/ 483]             blk.1.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  30/ 483]                    blk.1.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  31/ 483]                  blk.1.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  32/ 483]                    blk.1.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  33/ 483]                  blk.1.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[  34/ 483]                blk.10.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  35/ 483]              blk.10.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  36/ 483]                   blk.10.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[  37/ 483]                 blk.10.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[  38/ 483]                 blk.10.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  39/ 483]               blk.10.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[  40/ 483]                 blk.10.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  41/ 483]               blk.10.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  42/ 483]                   blk.10.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  43/ 483]                 blk.10.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  44/ 483]              blk.10.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  45/ 483]            blk.10.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  46/ 483]                   blk.10.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  47/ 483]                 blk.10.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  48/ 483]                   blk.10.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  49/ 483]                 blk.10.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[  50/ 483]                blk.11.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  51/ 483]              blk.11.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  52/ 483]                   blk.11.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[  53/ 483]                 blk.11.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[  54/ 483]                 blk.11.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  55/ 483]               blk.11.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[  56/ 483]                 blk.11.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  57/ 483]               blk.11.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  58/ 483]                   blk.11.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  59/ 483]                 blk.11.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  60/ 483]              blk.11.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  61/ 483]            blk.11.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  62/ 483]                   blk.11.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  63/ 483]                 blk.11.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  64/ 483]                   blk.11.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  65/ 483]                 blk.11.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  66/ 483]                blk.12.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  67/ 483]              blk.12.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  68/ 483]                   blk.12.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[  69/ 483]                 blk.12.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[  70/ 483]                 blk.12.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  71/ 483]               blk.12.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[  72/ 483]                 blk.12.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  73/ 483]               blk.12.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  74/ 483]                   blk.12.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  75/ 483]                 blk.12.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  76/ 483]              blk.12.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  77/ 483]            blk.12.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  78/ 483]                   blk.12.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  79/ 483]                 blk.12.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  80/ 483]                   blk.12.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  81/ 483]                 blk.12.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  82/ 483]                blk.13.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  83/ 483]              blk.13.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  84/ 483]                   blk.13.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[  85/ 483]                 blk.13.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[  86/ 483]                 blk.13.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  87/ 483]               blk.13.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[  88/ 483]                 blk.13.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  89/ 483]               blk.13.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  90/ 483]                   blk.13.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  91/ 483]                 blk.13.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[  92/ 483]              blk.13.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  93/ 483]            blk.13.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  94/ 483]                   blk.13.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  95/ 483]                 blk.13.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[  96/ 483]                   blk.13.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  97/ 483]                 blk.13.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[  98/ 483]                blk.14.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  99/ 483]              blk.14.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 100/ 483]                   blk.14.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 101/ 483]                 blk.14.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 102/ 483]                 blk.14.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 103/ 483]               blk.14.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 104/ 483]                 blk.14.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 105/ 483]               blk.14.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 106/ 483]                   blk.14.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 107/ 483]                 blk.14.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 108/ 483]              blk.14.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 109/ 483]            blk.14.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 110/ 483]                   blk.14.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 111/ 483]                 blk.14.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 112/ 483]                   blk.14.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 113/ 483]                 blk.14.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 114/ 483]                blk.15.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 115/ 483]              blk.15.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 116/ 483]                   blk.15.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 117/ 483]                 blk.15.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 118/ 483]                 blk.15.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 119/ 483]               blk.15.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 120/ 483]                 blk.15.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 121/ 483]               blk.15.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 122/ 483]                   blk.15.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 123/ 483]                 blk.15.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 124/ 483]              blk.15.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 125/ 483]            blk.15.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 126/ 483]                   blk.15.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 127/ 483]                 blk.15.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 128/ 483]                   blk.15.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 129/ 483]                 blk.15.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 130/ 483]                blk.16.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 131/ 483]              blk.16.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 132/ 483]                   blk.16.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 133/ 483]                 blk.16.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 134/ 483]                 blk.16.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 135/ 483]               blk.16.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 136/ 483]                 blk.16.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 137/ 483]               blk.16.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 138/ 483]                   blk.16.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 139/ 483]                 blk.16.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 140/ 483]              blk.16.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 141/ 483]            blk.16.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 142/ 483]                   blk.16.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 143/ 483]                 blk.16.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 144/ 483]                   blk.16.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 145/ 483]                 blk.16.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 146/ 483]                blk.17.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 147/ 483]              blk.17.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 148/ 483]                   blk.17.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 149/ 483]                 blk.17.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 150/ 483]                 blk.17.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 151/ 483]               blk.17.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 152/ 483]                 blk.17.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 153/ 483]               blk.17.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 154/ 483]                   blk.17.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 155/ 483]                 blk.17.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 156/ 483]              blk.17.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 157/ 483]            blk.17.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 158/ 483]                   blk.17.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 159/ 483]                 blk.17.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 160/ 483]                   blk.17.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 161/ 483]                 blk.17.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 162/ 483]                blk.18.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 163/ 483]              blk.18.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 164/ 483]                   blk.18.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 165/ 483]                 blk.18.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 166/ 483]                 blk.18.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 167/ 483]               blk.18.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 168/ 483]                 blk.18.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 169/ 483]               blk.18.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 170/ 483]                   blk.18.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 171/ 483]                 blk.18.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 172/ 483]              blk.18.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 173/ 483]            blk.18.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 174/ 483]                   blk.18.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 175/ 483]                 blk.18.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 176/ 483]                   blk.18.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 177/ 483]                 blk.18.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 178/ 483]                blk.19.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 179/ 483]              blk.19.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 180/ 483]                   blk.19.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 181/ 483]                 blk.19.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 182/ 483]                 blk.19.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 183/ 483]               blk.19.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 184/ 483]                 blk.19.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 185/ 483]               blk.19.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 186/ 483]                   blk.19.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 187/ 483]                 blk.19.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 188/ 483]              blk.19.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 189/ 483]            blk.19.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 190/ 483]                   blk.19.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 191/ 483]                 blk.19.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 192/ 483]                   blk.19.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 193/ 483]                 blk.19.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 194/ 483]                 blk.2.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 195/ 483]               blk.2.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 196/ 483]                    blk.2.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 197/ 483]                  blk.2.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 198/ 483]                  blk.2.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 199/ 483]                blk.2.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 200/ 483]                  blk.2.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 201/ 483]                blk.2.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 202/ 483]                    blk.2.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 203/ 483]                  blk.2.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 204/ 483]               blk.2.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 205/ 483]             blk.2.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 206/ 483]                    blk.2.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 207/ 483]                  blk.2.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 208/ 483]                    blk.2.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 209/ 483]                  blk.2.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 210/ 483]                blk.20.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 211/ 483]              blk.20.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 212/ 483]                   blk.20.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 213/ 483]                 blk.20.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 214/ 483]                 blk.20.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 215/ 483]               blk.20.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 216/ 483]                 blk.20.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 217/ 483]               blk.20.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 218/ 483]                   blk.20.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 219/ 483]                 blk.20.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 220/ 483]              blk.20.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 221/ 483]            blk.20.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 222/ 483]                   blk.20.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 223/ 483]                 blk.20.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 224/ 483]                   blk.20.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 225/ 483]                 blk.20.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 226/ 483]                blk.21.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 227/ 483]              blk.21.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 228/ 483]                   blk.21.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 229/ 483]                 blk.21.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 230/ 483]                 blk.21.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 231/ 483]               blk.21.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 232/ 483]                 blk.21.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 233/ 483]               blk.21.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 234/ 483]                   blk.21.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 235/ 483]                 blk.21.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 236/ 483]              blk.21.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 237/ 483]            blk.21.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 238/ 483]                   blk.21.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 239/ 483]                 blk.21.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 240/ 483]                   blk.21.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 241/ 483]                 blk.21.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 242/ 483]                blk.22.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 243/ 483]              blk.22.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 244/ 483]                   blk.22.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 245/ 483]                 blk.22.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 246/ 483]                 blk.22.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 247/ 483]               blk.22.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 248/ 483]                 blk.22.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 249/ 483]               blk.22.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 250/ 483]                   blk.22.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 251/ 483]                 blk.22.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 252/ 483]              blk.22.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 253/ 483]            blk.22.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 254/ 483]                   blk.22.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 255/ 483]                 blk.22.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 256/ 483]                   blk.22.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 257/ 483]                 blk.22.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 258/ 483]                blk.23.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 259/ 483]              blk.23.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 260/ 483]                   blk.23.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 261/ 483]                 blk.23.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 262/ 483]                 blk.23.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 263/ 483]               blk.23.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 264/ 483]                 blk.23.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 265/ 483]               blk.23.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 266/ 483]                   blk.23.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 267/ 483]                 blk.23.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 268/ 483]              blk.23.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 269/ 483]            blk.23.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 270/ 483]                   blk.23.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 271/ 483]                 blk.23.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 272/ 483]                   blk.23.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 273/ 483]                 blk.23.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 274/ 483]                blk.24.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 275/ 483]              blk.24.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 276/ 483]                   blk.24.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 277/ 483]                 blk.24.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 278/ 483]                 blk.24.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 279/ 483]               blk.24.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 280/ 483]                 blk.24.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 281/ 483]               blk.24.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 282/ 483]                   blk.24.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 283/ 483]                 blk.24.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 284/ 483]              blk.24.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 285/ 483]            blk.24.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 286/ 483]                   blk.24.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 287/ 483]                 blk.24.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 288/ 483]                   blk.24.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 289/ 483]                 blk.24.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 290/ 483]                blk.25.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 291/ 483]              blk.25.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 292/ 483]                   blk.25.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 293/ 483]                 blk.25.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 294/ 483]                 blk.25.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 295/ 483]               blk.25.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 296/ 483]                 blk.25.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 297/ 483]               blk.25.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 298/ 483]                   blk.25.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 299/ 483]                 blk.25.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 300/ 483]              blk.25.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 301/ 483]            blk.25.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 302/ 483]                   blk.25.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 303/ 483]                 blk.25.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 304/ 483]                   blk.25.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 305/ 483]                 blk.25.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 306/ 483]                blk.26.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 307/ 483]              blk.26.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 308/ 483]                   blk.26.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 309/ 483]                 blk.26.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 310/ 483]                 blk.26.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 311/ 483]               blk.26.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 312/ 483]                 blk.26.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 313/ 483]               blk.26.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 314/ 483]                   blk.26.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 315/ 483]                 blk.26.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 316/ 483]              blk.26.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 317/ 483]            blk.26.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 318/ 483]                   blk.26.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 319/ 483]                 blk.26.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 320/ 483]                   blk.26.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 321/ 483]                 blk.26.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 322/ 483]                blk.27.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 323/ 483]              blk.27.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 324/ 483]                   blk.27.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 325/ 483]                 blk.27.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 326/ 483]                 blk.27.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 327/ 483]               blk.27.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 328/ 483]                 blk.27.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 329/ 483]               blk.27.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 330/ 483]                   blk.27.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 331/ 483]                 blk.27.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 332/ 483]              blk.27.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 333/ 483]            blk.27.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 334/ 483]                   blk.27.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 335/ 483]                 blk.27.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 336/ 483]                   blk.27.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 337/ 483]                 blk.27.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 338/ 483]                blk.28.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 339/ 483]              blk.28.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 340/ 483]                   blk.28.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 341/ 483]                 blk.28.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 342/ 483]                 blk.28.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 343/ 483]               blk.28.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 344/ 483]                 blk.28.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 345/ 483]               blk.28.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 346/ 483]                   blk.28.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 347/ 483]                 blk.28.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 348/ 483]              blk.28.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 349/ 483]            blk.28.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 350/ 483]                   blk.28.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 351/ 483]                 blk.28.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 352/ 483]                   blk.28.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 353/ 483]                 blk.28.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 354/ 483]                blk.29.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 355/ 483]              blk.29.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 356/ 483]                   blk.29.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 357/ 483]                 blk.29.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 358/ 483]                 blk.29.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 359/ 483]               blk.29.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 360/ 483]                 blk.29.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 361/ 483]               blk.29.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 362/ 483]                   blk.29.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 363/ 483]                 blk.29.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 364/ 483]              blk.29.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 365/ 483]            blk.29.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 366/ 483]                   blk.29.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 367/ 483]                 blk.29.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 368/ 483]                   blk.29.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 369/ 483]                 blk.29.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 370/ 483]                 blk.3.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 371/ 483]               blk.3.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 372/ 483]                    blk.3.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 373/ 483]                  blk.3.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 374/ 483]                  blk.3.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 375/ 483]                blk.3.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 376/ 483]                  blk.3.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 377/ 483]                blk.3.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 378/ 483]                    blk.3.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 379/ 483]                  blk.3.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 380/ 483]               blk.3.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 381/ 483]             blk.3.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 382/ 483]                    blk.3.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 383/ 483]                  blk.3.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 384/ 483]                    blk.3.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 385/ 483]                  blk.3.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 386/ 483]                 blk.4.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 387/ 483]               blk.4.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 388/ 483]                    blk.4.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 389/ 483]                  blk.4.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 390/ 483]                  blk.4.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 391/ 483]                blk.4.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 392/ 483]                  blk.4.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 393/ 483]                blk.4.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 394/ 483]                    blk.4.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 395/ 483]                  blk.4.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 396/ 483]               blk.4.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 397/ 483]             blk.4.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 398/ 483]                    blk.4.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 399/ 483]                  blk.4.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 400/ 483]                    blk.4.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 401/ 483]                  blk.4.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 402/ 483]                 blk.5.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 403/ 483]               blk.5.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 404/ 483]                    blk.5.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 405/ 483]                  blk.5.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 406/ 483]                  blk.5.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 407/ 483]                blk.5.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 408/ 483]                  blk.5.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 409/ 483]                blk.5.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 410/ 483]                    blk.5.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 411/ 483]                  blk.5.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 412/ 483]               blk.5.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 413/ 483]             blk.5.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 414/ 483]                    blk.5.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 415/ 483]                  blk.5.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 416/ 483]                    blk.5.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 417/ 483]                  blk.5.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 418/ 483]                 blk.6.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 419/ 483]               blk.6.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 420/ 483]                    blk.6.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 421/ 483]                  blk.6.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 422/ 483]                  blk.6.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 423/ 483]                blk.6.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 424/ 483]                  blk.6.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 425/ 483]                blk.6.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 426/ 483]                    blk.6.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 427/ 483]                  blk.6.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 428/ 483]               blk.6.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 429/ 483]             blk.6.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 430/ 483]                    blk.6.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 431/ 483]                  blk.6.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 432/ 483]                    blk.6.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 433/ 483]                  blk.6.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 434/ 483]                 blk.7.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 435/ 483]               blk.7.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 436/ 483]                    blk.7.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 437/ 483]                  blk.7.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 438/ 483]                  blk.7.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 439/ 483]                blk.7.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q6_K .. size =   144.00 MiB ->    29.53 MiB
[ 440/ 483]                  blk.7.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 441/ 483]                blk.7.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 442/ 483]                    blk.7.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 443/ 483]                  blk.7.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 444/ 483]               blk.7.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 445/ 483]             blk.7.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 446/ 483]                    blk.7.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 447/ 483]                  blk.7.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 448/ 483]                    blk.7.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 449/ 483]                  blk.7.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 450/ 483]                 blk.8.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 451/ 483]               blk.8.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 452/ 483]                    blk.8.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 453/ 483]                  blk.8.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 454/ 483]                  blk.8.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 455/ 483]                blk.8.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 456/ 483]                  blk.8.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 457/ 483]                blk.8.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 458/ 483]                    blk.8.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 459/ 483]                  blk.8.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 460/ 483]               blk.8.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 461/ 483]             blk.8.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 462/ 483]                    blk.8.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 463/ 483]                  blk.8.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 464/ 483]                    blk.8.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 465/ 483]                  blk.8.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 466/ 483]                 blk.9.attn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 467/ 483]               blk.9.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 468/ 483]                    blk.9.ffn_up.bias - [12288,     1,     1,     1], type =    f32, size =    0.047 MB
[ 469/ 483]                  blk.9.ffn_up.weight - [ 3072, 12288,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 470/ 483]                  blk.9.ffn_down.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 471/ 483]                blk.9.ffn_down.weight - [12288,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =   144.00 MiB ->    20.25 MiB
[ 472/ 483]                  blk.9.ffn_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 473/ 483]                blk.9.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 474/ 483]                    blk.9.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 475/ 483]                  blk.9.attn_k.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q4_K .. size =     3.00 MiB ->     0.42 MiB
[ 476/ 483]               blk.9.attn_output.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 477/ 483]             blk.9.attn_output.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 478/ 483]                    blk.9.attn_q.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 479/ 483]                  blk.9.attn_q.weight - [ 3072,  3072,     1,     1], type =    f32, quantizing to q4_K .. size =    36.00 MiB ->     5.06 MiB
[ 480/ 483]                    blk.9.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 481/ 483]                  blk.9.attn_v.weight - [ 3072,   256,     1,     1], type =    f32, quantizing to q6_K .. size =     3.00 MiB ->     0.62 MiB
[ 482/ 483]                     output_norm.bias - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 483/ 483]                   output_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
llama_model_quantize_internal: model size  = 11559.95 MB
llama_model_quantize_internal: quant size  =  1761.66 MB

main: quantize time =  6205.08 ms
main:    total time =  6205.08 ms
  1. Test on a sample:
    Command:
./main -m models/starcoder2-3b.gguf -p "#python code for efficient implemetation of two_sum\ndef two_sum(arr, target_sum):\n" -n 60 -e

Output:

Log start
main: build = 2299 (d62ce1c6)
main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
main: seed  = 1709286693
llama_model_loader: loaded meta data with 17 key-value pairs and 483 tensors from models/starcoder2-3b.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = starcoder2
llama_model_loader: - kv   1:                               general.name str              = starcoder2-3b
llama_model_loader: - kv   2:                     starcoder2.block_count u32              = 30
llama_model_loader: - kv   3:                  starcoder2.context_length u32              = 16384
llama_model_loader: - kv   4:                starcoder2.embedding_length u32              = 3072
llama_model_loader: - kv   5:             starcoder2.feed_forward_length u32              = 12288
llama_model_loader: - kv   6:            starcoder2.attention.head_count u32              = 24
llama_model_loader: - kv   7:         starcoder2.attention.head_count_kv u32              = 2
llama_model_loader: - kv   8:    starcoder2.attention.layer_norm_epsilon f32              = 0,000010
llama_model_loader: - kv   9:                          general.file_type u32              = 0
llama_model_loader: - kv  10:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  11:                      tokenizer.ggml.tokens arr[str,49152]   = ["<|endoftext|>", "<fim_prefix>", "<f...
llama_model_loader: - kv  12:                  tokenizer.ggml.token_type arr[i32,49152]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  13:                      tokenizer.ggml.merges arr[str,48872]   = ["Ġ Ġ", "ĠĠ ĠĠ", "ĠĠĠĠ ĠĠ...
llama_model_loader: - kv  14:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  15:                tokenizer.ggml.eos_token_id u32              = 0
llama_model_loader: - kv  16:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - type  f32:  483 tensors
llm_load_vocab: special tokens definition check successful ( 38/49152 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = starcoder2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 49152
llm_load_print_meta: n_merges         = 48872
llm_load_print_meta: n_ctx_train      = 16384
llm_load_print_meta: n_embd           = 3072
llm_load_print_meta: n_head           = 24
llm_load_print_meta: n_head_kv        = 2
llm_load_print_meta: n_layer          = 30
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 12
llm_load_print_meta: n_embd_k_gqa     = 256
llm_load_print_meta: n_embd_v_gqa     = 256
llm_load_print_meta: f_norm_eps       = 1,0e-05
llm_load_print_meta: f_norm_rms_eps   = 0,0e+00
llm_load_print_meta: f_clamp_kqv      = 0,0e+00
llm_load_print_meta: f_max_alibi_bias = 0,0e+00
llm_load_print_meta: n_ff             = 12288
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000,0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 16384
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 3B
llm_load_print_meta: model ftype      = all F32
llm_load_print_meta: model params     = 3,03 B
llm_load_print_meta: model size       = 11,29 GiB (32,00 BPW) 
llm_load_print_meta: general.name     = starcoder2-3b
llm_load_print_meta: BOS token        = 0 '<|endoftext|>'
llm_load_print_meta: EOS token        = 0 '<|endoftext|>'
llm_load_print_meta: UNK token        = 0 '<|endoftext|>'
llm_load_print_meta: LF token         = 164 'Ä'
llm_load_tensors: ggml ctx size =    0,18 MiB
llm_load_tensors:        CPU buffer size = 11559,95 MiB
..............................................................................
.
.
.
.
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000,0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =    15,00 MiB
llama_new_context_with_model: KV self size  =   15,00 MiB, K (f16):    7,50 MiB, V (f16):    7,50 MiB
llama_new_context_with_model:        CPU input buffer size   =     8,01 MiB
llama_new_context_with_model:        CPU compute buffer size =   108,00 MiB
llama_new_context_with_model: graph splits (measure): 1

system_info: n_threads = 64 / 128 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | 
sampling: 
	repeat_last_n = 64, repeat_penalty = 1,100, frequency_penalty = 0,000, presence_penalty = 0,000
	top_k = 40, tfs_z = 1,000, top_p = 0,950, min_p = 0,050, typical_p = 1,000, temp = 0,800
	mirostat = 0, mirostat_lr = 0,100, mirostat_ent = 5,000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature 
generate: n_ctx = 512, n_batch = 512, n_predict = 60, n_keep = 0


#python code for efficient implemetation of two_sum
def two_sum(arr, target_sum):

#include <vector>
# #
def
deftwo(a):


defsum(int arr,




#include<stdio(int main()

#include<bits/
#include<stdio.h
#define
intmain()
llama_print_timings:        load time =    1000,83 ms
llama_print_timings:      sample time =      32,65 ms /    60 runs   (    0,54 ms per token,  1837,56 tokens per second)
llama_print_timings: prompt eval time =     363,40 ms /    26 tokens (   13,98 ms per token,    71,55 tokens per second)
llama_print_timings:        eval time =    7341,69 ms /    59 runs   (  124,44 ms per token,     8,04 tokens per second)
llama_print_timings:       total time =    7769,73 ms /    85 tokens
Log end

Whereas, loading transformers model gives correct outputs:
Screenshot 2024-03-01 at 3 25 22 PM

@pacman100 pacman100 marked this pull request as ready for review March 1, 2024 09:57
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After fixing the rope type, I think this is ready to merge, but would recommend some tests before doing so to make sure that the results make sense now

@pacman100
Copy link
Contributor Author

but would recommend some tests before doing so to make sure that the results make sense now

I did some generations locally using normal as well as FIM format, and the results are as per expectations:

Samples from 3B model. Commands are quoted and outputs in diff format to show the generations from the model:

./main -m models/starcoder2-3b-Q4_K_M.gguf -p "def print_hello_world():" -n 64 -e --temp 0.2

Output

def print_hello_world():
+    print("Hello World!")

+print_hello_world()

+/python/python_basics.py
+# Python Basics

+# Variables
+x = 5
+y = "John"
+print(type(x)) # Prints "<class 'int'>"
+print(type(y

/main -m models/starcoder2-3b-Q4_K_M.gguf -p "#python code for efficient implemetation of two_sum\ndef two_sum(arr, target_sum):\n" -n 128 -e --temp 0.2

#python code for efficient implemetation of two_sum
def two_sum(arr, target_sum):
+	hash = {} #dictionary to store the values of the elements in the array
+	for i in range(len(arr)):
+		if arr[i] not in hash:
+			hash[target_sum - arr[i]] = 1
+		else:
+			return True
+	return False

+#python code for efficient implemetation of three_sum
+def three_sum(arr, target_sum):
+	hash = {} #dictionary to store the values of the elements in the array
+	for i in range(len(arr)):
+		if arr[i] not

./main -m models/starcoder2-3b-Q4_K_M.gguf -p '<fim_prefix>\ndef _get_model_architecture(self) -> gguf.MODEL_ARCH:\n\tarch = self.hparams["architectures"][0]\n\tif arch == "GPTNeoXForCausalLM":\n\t\treturn gguf.MODEL_ARCH.GPTNEOX\n\tif arch == "BloomForCausalLM":\n\t\treturn gguf.MODEL_ARCH.BLOOM\n\tif arch == "MPTForCausalLM":\n\t\treturn gguf.MODEL_ARCH.MPT\n\tif arch in ("BaichuanForCausalLM", "BaiChuanForCausalLM"):\n\t\treturn gguf.MODEL_ARCH.BAICHUAN\n\tif arch in ("FalconForCausalLM", "RWForCausalLM"):\n\t\treturn gguf.MODEL_ARCH.FALCON\n\tif arch == "GPTBigCodeForCausalLM":\n\t\treturn gguf.MODEL_ARCH.STARCODER\n\tif arch == "GPTRefactForCausalLM":\n\t\treturn gguf.MODEL_ARCH.REFACT\n\tif arch == "PersimmonForCausalLM":\n\t\treturn gguf.MODEL_ARCH.PERSIMMON\n\tif arch in ("StableLmForCausalLM", "StableLMEpochForCausalLM", "LlavaStableLMEpochForCausalLM"):\n\t\treturn gguf.MODEL_ARCH.STABLELM\n\tif arch == "QWenLMHeadModel":\n\t\treturn gguf.MODEL_ARCH.QWEN\n\tif arch == "Qwen2ForCausalLM":\n\t\treturn gguf.MODEL_ARCH.QWEN2\n\tif arch == "MixtralForCausalLM":\n\t\treturn gguf.MODEL_ARCH.LLAMA\n\tif arch == "GPT2LMHeadModel":\n\t\treturn gguf.MODEL_ARCH.GPT2\n\tif arch == "PhiForCausalLM":\n\t\treturn gguf.MODEL_ARCH.PHI2\n\tif arch == "PlamoForCausalLM":\n\t\treturn gguf.MODEL_ARCH.PLAMO\n\tif arch == "CodeShellForCausalLM":\n\t\treturn gguf.MODEL_ARCH.CODESHELL\n\tif arch == "OrionForCausalLM":\n\t\treturn gguf.MODEL_ARCH.ORION\n\tif arch == "InternLM2ForCausalLM":\n\t\treturn gguf.MODEL_ARCH.INTERNLM2\n\tif arch == "MiniCPMForCausalLM":\n\t\treturn gguf.MODEL_ARCH.MINICPM\n\tif arch == "BertModel":\n\t\treturn gguf.MODEL_ARCH.BERT\n\tif arch == "NomicBertModel":\n\t\treturn gguf.MODEL_ARCH.NOMIC_BERT\n\tif arch == "GemmaForCausalLM":\n\t\treturn gguf.MODEL_ARCH.GEMMA\n\tif arch == "Starcoder2ForCausalLM":\n<fim_suffix>\n\t\traise NotImplementedError(f"Architecture "{arch}" not supported!")\n<fim_middle>' -c 2048 -n 16 -e --temp 0.2

Output. Note <fim_prefix>, <fim_suffix> and <fim_middle> were removed by ./main, the highlighted generation is after the <fim_middle> and hence correct:

def _get_model_architecture(self) -> gguf.MODEL_ARCH:
	arch = self.hparams["architectures"][0]
	if arch == "GPTNeoXForCausalLM":
		return gguf.MODEL_ARCH.GPTNEOX
	if arch == "BloomForCausalLM":
		return gguf.MODEL_ARCH.BLOOM
	if arch == "MPTForCausalLM":
		return gguf.MODEL_ARCH.MPT
	if arch in ("BaichuanForCausalLM", "BaiChuanForCausalLM"):
		return gguf.MODEL_ARCH.BAICHUAN
	if arch in ("FalconForCausalLM", "RWForCausalLM"):
		return gguf.MODEL_ARCH.FALCON
	if arch == "GPTBigCodeForCausalLM":
		return gguf.MODEL_ARCH.STARCODER
	if arch == "GPTRefactForCausalLM":
		return gguf.MODEL_ARCH.REFACT
	if arch == "PersimmonForCausalLM":
		return gguf.MODEL_ARCH.PERSIMMON
	if arch in ("StableLmForCausalLM", "StableLMEpochForCausalLM", "LlavaStableLMEpochForCausalLM"):
		return gguf.MODEL_ARCH.STABLELM
	if arch == "QWenLMHeadModel":
		return gguf.MODEL_ARCH.QWEN
	if arch == "Qwen2ForCausalLM":
		return gguf.MODEL_ARCH.QWEN2
	if arch == "MixtralForCausalLM":
		return gguf.MODEL_ARCH.LLAMA
	if arch == "GPT2LMHeadModel":
		return gguf.MODEL_ARCH.GPT2
	if arch == "PhiForCausalLM":
		return gguf.MODEL_ARCH.PHI2
	if arch == "PlamoForCausalLM":
		return gguf.MODEL_ARCH.PLAMO
	if arch == "CodeShellForCausalLM":
		return gguf.MODEL_ARCH.CODESHELL
	if arch == "OrionForCausalLM":
		return gguf.MODEL_ARCH.ORION
	if arch == "InternLM2ForCausalLM":
		return gguf.MODEL_ARCH.INTERNLM2
	if arch == "MiniCPMForCausalLM":
		return gguf.MODEL_ARCH.MINICPM
	if arch == "BertModel":
		return gguf.MODEL_ARCH.BERT
	if arch == "NomicBertModel":
		return gguf.MODEL_ARCH.NOMIC_BERT
	if arch == "GemmaForCausalLM":
		return gguf.MODEL_ARCH.GEMMA
	if arch == "Starcoder2ForCausalLM":

		raise NotImplementedError(f"Architecture "{arch}" not supported!")
+		return gguf.MODEL_ARCH.STARCODER2

@pacman100
Copy link
Contributor Author

Finally, good to merge this PR @ggerganov. Thank you!

@ggerganov ggerganov merged commit c29af7e into ggerganov:master Mar 1, 2024
hazelnutcloud pushed a commit to hazelnutcloud/llama.cpp that referenced this pull request Mar 10, 2024
* Add support for starcoder2

* handle rope type

* skip rope freq and rotary embeddings from being serialized

* resolve comments

* Update llama.cpp

* remove redundant changes

* handle `rope-theta`

* llama : change starcoder2 rope type

* address comment

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* Add support for starcoder2

* handle rope type

* skip rope freq and rotary embeddings from being serialized

* resolve comments

* Update llama.cpp

* remove redundant changes

* handle `rope-theta`

* llama : change starcoder2 rope type

* address comment

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* Add support for starcoder2

* handle rope type

* skip rope freq and rotary embeddings from being serialized

* resolve comments

* Update llama.cpp

* remove redundant changes

* handle `rope-theta`

* llama : change starcoder2 rope type

* address comment

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants