Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model: support arch DbrxForCausalLM #6515

Merged
merged 81 commits into from
Apr 13, 2024
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
1d8de31
model: dbrx convert to gguf
phymbert Apr 6, 2024
ed582c1
llama: support dbrx
phymbert Apr 6, 2024
3e3d2d1
gguf-py: remove wrong clip -> clamp
phymbert Apr 6, 2024
3937100
model: dbrx, trust remote code
phymbert Apr 6, 2024
c0beb3c
llama: add label for model 132B
phymbert Apr 6, 2024
0921033
model: dbrx fix python linter in convert-hf-to-gguf.py
phymbert Apr 6, 2024
e4f8ee4
llama: support dbrx fix norm type
phymbert Apr 6, 2024
a7f9a3e
dbrx: minor
phymbert Apr 6, 2024
e3c1e81
convert: dbrx: fix mixed up and down expert tensors
phymbert Apr 6, 2024
0a35f58
convert: dbrx: fix mixed up and down expert tensors
phymbert Apr 6, 2024
c8e6f90
doc: dbrx: add the model as supported
phymbert Apr 6, 2024
916b918
convert: dbrx: fix remove wrong ATTN_OUT_NORM tensor, add output laye…
phymbert Apr 6, 2024
03da419
llama: dbrx: remove wrong attn output layer in model arch
phymbert Apr 6, 2024
76f266b
scripts: get-wikitext-2 add unzip
phymbert Apr 6, 2024
9c7dedb
llama: dbrx: no attention output layer
phymbert Apr 6, 2024
fe80898
model: dbrx: fix missing embedding tensor, mix with output layer
phymbert Apr 6, 2024
4f12a58
llama: dbrx: remove not existing condition on empty output layer
phymbert Apr 6, 2024
6985629
Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx
phymbert Apr 6, 2024
7e7cd53
llama: dbrx: remove unnecessary optional tensor on FFN_GATE_EXPS
phymbert Apr 6, 2024
52c4033
llama: increase maximum experts allowed
phymbert Apr 7, 2024
06a59ab
model: dbrx: convert add n_ff
phymbert Apr 7, 2024
305ac3b
llama: dbrx: quantize fix n_attention_wv tensor name
phymbert Apr 7, 2024
b6522a9
model: dbrx: convert fix tokenizer
phymbert Apr 7, 2024
dccb012
llama: dbrx: quantize fix n_attention_wv tensor name
phymbert Apr 7, 2024
61be4b9
model: convert-hf-to-gguf.py add _set_vocab_tiktoken gpt2 backed on l…
phymbert Apr 7, 2024
1fb6d95
model: convert-hf-to-gguf.py fix classname conflict with qwen2
phymbert Apr 7, 2024
200ce21
model: dbrx: convert-hf-to-gguf.py fix fix ftype missing, fix tensor …
phymbert Apr 7, 2024
9e17dad
model: dbrx: convert-hf-to-gguf.py add chat template
phymbert Apr 7, 2024
d7546fd
llama: quantize: remove wrong look for tensor qkv name as it was badl…
phymbert Apr 7, 2024
3a9dc2e
model: dbrx: convert-hf-to-gguf.py fix 'token_embd.weight' has wrong …
phymbert Apr 7, 2024
8154617
model: dbrx: convert-hf-to-gguf.py support python 3.8
phymbert Apr 7, 2024
2449ef4
llama: dbrx: no weight suffix in ffn_gate_exps, ffn_up_exps and ffn_d…
phymbert Apr 7, 2024
1bd9427
llama: quantize: remove wrong look for tensor qkv name as it was badl…
phymbert Apr 7, 2024
e9987c6
llama: dbrx: fix tensor qkv number of elements
phymbert Apr 7, 2024
d151d8f
model: dbrx: convert reshape expert tensors to 3D
phymbert Apr 7, 2024
f062b83
model: dbrx: convert experts to f16
phymbert Apr 7, 2024
dbfd591
model: dbrx: fix tensor names mapping broken
phymbert Apr 7, 2024
7dd84b0
model: dbrx: fix expert reshape
phymbert Apr 7, 2024
c9bddbf
model: dbrx: fix expert reshape
phymbert Apr 7, 2024
e2c9199
model: dbrx: fix again sic expert reshape
phymbert Apr 7, 2024
50b4373
model: dbrx: weird fix expert reshape
phymbert Apr 7, 2024
0ab1bae
llama: dbrx: output norm dim
phymbert Apr 7, 2024
830e46d
llama: dbrx: fix last normalization
phymbert Apr 7, 2024
2897aa6
llama: dbrx: revert
phymbert Apr 7, 2024
993f836
llama: dbrx: move norm2 after attention, fix build kv
phymbert Apr 7, 2024
b01b062
llama: dbrx: fix build kv att out
phymbert Apr 7, 2024
74e6d87
llama: dbrx: fix build kv att out tensor name
phymbert Apr 7, 2024
f8f97e7
llama: dbrx: hardcode nn.LayerNorm epsilon
phymbert Apr 7, 2024
71f9e47
llama: dbrx: Try another rope type
phymbert Apr 7, 2024
52c6276
llama: dbrx: fix k scale
phymbert Apr 8, 2024
8e22688
llama: dbrx: move norm epsilon to convert. Fix missing normalization.
phymbert Apr 8, 2024
35dce3e
llama: dbrx: rename tensor to actual meaning. Fix normalization in gr…
phymbert Apr 8, 2024
506cc2e
llama: dbrx: convert remove previous reverse
phymbert Apr 8, 2024
eb0847e
llama: dbrx: load norm eps in hparams
phymbert Apr 8, 2024
81f308a
llama: dbrx: fix experts tensor layout
phymbert Apr 8, 2024
21fb24a
model: dbrx: convert-hf-to-gguf.py fix experts tensors shapes
phymbert Apr 8, 2024
f20c04f
llama: factorize moe graph implementation between grok, mixtral and dbrx
phymbert Apr 8, 2024
48909ed
model: dbrx convert permute experts directly torch, log shape
phymbert Apr 8, 2024
18a84fe
llama: dbrx: fix experts 3D tensor layout (again)
phymbert Apr 8, 2024
9968952
llama: dbrx: fix experts 3D tensor layout (again)
phymbert Apr 8, 2024
e66f1e3
llama: dbrx: document changes, permute only FFN_DOWN_EXPS. Add a chec…
phymbert Apr 8, 2024
f30a73b
llama: dbrx: rename layer_out_norm to attn_out_norm
phymbert Apr 8, 2024
ea8b58c
llama: dbrx: first add the residuals and then do the norm
phymbert Apr 8, 2024
55943a2
model: dbrx: convert fix mixed ffn_gate_exps and ffn_down_exps
phymbert Apr 8, 2024
c7b9a2e
llama: dbrx: fix ggml context of the attention outputs weight
phymbert Apr 8, 2024
ac82aa0
gguf-py: revert spaces
phymbert Apr 8, 2024
ac75fbd
gguf-py: dbrx: reverse again the MOE tensors mapping:
phymbert Apr 9, 2024
e5631cf
Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx
phymbert Apr 9, 2024
6f813dc
Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx
phymbert Apr 10, 2024
74529e5
llama: dbrx: use the MOE naming convention for model type
phymbert Apr 10, 2024
06527c6
Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx
phymbert Apr 11, 2024
fc89fee
model: convert-hf-to-gguf.py remove tiktoken
phymbert Apr 11, 2024
bdc4efe
Is silu activation function applied to MODEL_TENSOR.FFN_GATE_EXP here…
phymbert Apr 12, 2024
542585f
Is silu activation function applied to MODEL_TENSOR.FFN_GATE_EXP here…
phymbert Apr 12, 2024
ecbfb1b
Wrong input was being fed to moe layer. This needs to be corrected
phymbert Apr 12, 2024
647a11b
eval-callback: also print last n elements of each dimension
phymbert Apr 12, 2024
03bdc36
minor spaces
phymbert Apr 12, 2024
8e6758f
convert: update comment of MOE tensors mapping
phymbert Apr 12, 2024
f1256dc
llama: rename build_moe to build_moe_ffn and fix grok is using gelu i…
phymbert Apr 12, 2024
e517585
convert-hf-to-gguf.py: fix python linter
phymbert Apr 12, 2024
9f77484
minor: fix indent in llama_build_graph
phymbert Apr 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ Typically finetunes of the base models below are supported as well.
- [x] LLaMA 2 🦙🦙
- [X] [Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1)
- [x] [Mixtral MoE](https://huggingface.co/models?search=mistral-ai/Mixtral)
- [x] [DBRX](https://huggingface.co/databricks/dbrx-instruct)
- [X] Falcon
- [X] [Chinese LLaMA / Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) and [Chinese LLaMA-2 / Alpaca-2](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2)
- [X] [Vigogne (French)](https://github.com/bofenghuang/vigogne)
Expand Down
55 changes: 55 additions & 0 deletions convert-hf-to-gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -1424,6 +1424,61 @@ def write_tensors(self):
self.gguf_writer.add_tensor(new_name, data)


@Model.register("DbrxForCausalLM")
class Qwen2MoeModel(Model):
model_arch = gguf.MODEL_ARCH.DBRX

def set_gguf_parameters(self):
ffn_config = self.hparams["ffn_config"]
attn_config = self.hparams["attn_config"]
self.gguf_writer.add_name(self.hparams["model_type"])
self.gguf_writer.add_context_length(self.hparams["max_seq_len"])
self.gguf_writer.add_embedding_length(self.hparams["d_model"])
self.gguf_writer.add_block_count(self.hparams["n_layers"])
self.gguf_writer.add_head_count(self.hparams["n_heads"])
self.gguf_writer.add_head_count_kv(attn_config["kv_n_heads"])
self.gguf_writer.add_rope_freq_base(attn_config["rope_theta"])
self.gguf_writer.add_clamp_kqv(attn_config["clip_qkv"])
self.gguf_writer.add_file_type(self.ftype)

self.gguf_writer.add_expert_count(ffn_config["moe_num_experts"])
self.gguf_writer.add_expert_used_count(ffn_config["moe_top_k"])

def _set_vocab_gpt2(self):
dir_model = self.dir_model
tokens: list[str] = []
toktypes: list[int] = []

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(dir_model, trust_remote_code=True)
vocab_size = tokenizer.vocab_size

reverse_vocab = {id_: encoded_tok for encoded_tok, id_ in tokenizer.get_vocab().items()}
added_vocab = tokenizer.get_added_vocab()

# REVIEW: Not tested yet, need to deep dive this tiktoken
for i in range(vocab_size):
if i not in reverse_vocab:
tokens.append(f"[PAD{i}]")
toktypes.append(gguf.TokenType.USER_DEFINED)
elif reverse_vocab[i] in added_vocab:
tokens.append(reverse_vocab[i])
if tokenizer.added_tokens_decoder[i].special:
toktypes.append(gguf.TokenType.CONTROL)
else:
toktypes.append(gguf.TokenType.USER_DEFINED)
else:
tokens.append(reverse_vocab[i])
toktypes.append(gguf.TokenType.NORMAL)

self.gguf_writer.add_tokenizer_model("gpt2")
self.gguf_writer.add_token_list(tokens)
self.gguf_writer.add_token_types(toktypes)

special_vocab = gguf.SpecialVocab(dir_model, load_merges=True)
special_vocab.add_to_gguf(self.gguf_writer)


@Model.register("MiniCPMForCausalLM")
class MiniCPMModel(Model):
model_arch = gguf.MODEL_ARCH.MINICPM
Expand Down
15 changes: 15 additions & 0 deletions gguf-py/gguf/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ class MODEL_ARCH(IntEnum):
MAMBA = auto()
XVERSE = auto()
COMMAND_R = auto()
DBRX = auto()


class MODEL_TENSOR(IntEnum):
Expand Down Expand Up @@ -194,6 +195,7 @@ class MODEL_TENSOR(IntEnum):
MODEL_ARCH.MAMBA: "mamba",
MODEL_ARCH.XVERSE: "xverse",
MODEL_ARCH.COMMAND_R: "command-r",
MODEL_ARCH.DBRX: "dbrx",
}

TENSOR_NAMES: dict[MODEL_TENSOR, str] = {
Expand Down Expand Up @@ -639,6 +641,19 @@ class MODEL_TENSOR(IntEnum):
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.DBRX: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ATTN_QKV,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_NORM_2,
MODEL_TENSOR.FFN_GATE_INP,
MODEL_TENSOR.FFN_GATE_EXP,
MODEL_TENSOR.FFN_DOWN_EXP,
MODEL_TENSOR.FFN_UP_EXP,
MODEL_TENSOR.LAYER_OUT_NORM,
],
# TODO
}

Expand Down
63 changes: 37 additions & 26 deletions gguf-py/gguf/tensor_mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ class TensorNameMap:
"backbone.embedding", # mamba
"backbone.embeddings", # mamba-hf
"transformer.in_out_embed", # Grok
"transformer.wte.weight", # dbrx
),

# Token type embeddings
Expand Down Expand Up @@ -52,6 +53,7 @@ class TensorNameMap:
"output", # llama-pth bloom internlm2
"word_embeddings_for_head", # persimmon
"lm_head.linear", # phi2
"lm_head.weight", # dbrx
),

# Output norm
Expand All @@ -68,6 +70,7 @@ class TensorNameMap:
"model.norm_f", # mamba-qbert
"backbone.norm_f", # mamba
"transformer.rms_norm", # Grok
"transformer.norm_f.weight", # dbrx
),

# Rope frequencies
Expand All @@ -79,35 +82,38 @@ class TensorNameMap:
block_mappings_cfg: dict[MODEL_TENSOR, tuple[str, ...]] = {
# Attention norm
MODEL_TENSOR.ATTN_NORM: (
"gpt_neox.layers.{bid}.input_layernorm", # gptneox
"transformer.h.{bid}.ln_1", # gpt2 gpt-j refact qwen
"transformer.blocks.{bid}.norm_1", # mpt
"transformer.h.{bid}.input_layernorm", # falcon7b
"h.{bid}.input_layernorm", # bloom
"transformer.h.{bid}.ln_mlp", # falcon40b
"model.layers.{bid}.input_layernorm", # llama-hf
"layers.{bid}.attention_norm", # llama-pth
"language_model.encoder.layers.{bid}.input_layernorm", # persimmon
"model.layers.{bid}.ln1", # yi
"h.{bid}.ln_1", # gpt2
"transformer.h.{bid}.ln", # phi2
"model.layers.layers.{bid}.norm", # plamo
"model.layers.{bid}.attention_norm", # internlm2
"model.layers.{bid}.norm", # mamba-qbert
"backbone.layers.{bid}.norm", # mamba
"transformer.decoder_layer.{bid}.rms_norm", # Grok
"gpt_neox.layers.{bid}.input_layernorm", # gptneox
"transformer.h.{bid}.ln_1", # gpt2 gpt-j refact qwen
"transformer.blocks.{bid}.norm_1", # mpt
"transformer.h.{bid}.input_layernorm", # falcon7b
"h.{bid}.input_layernorm", # bloom
"transformer.h.{bid}.ln_mlp", # falcon40b
"model.layers.{bid}.input_layernorm", # llama-hf
"layers.{bid}.attention_norm", # llama-pth
"language_model.encoder.layers.{bid}.input_layernorm", # persimmon
"model.layers.{bid}.ln1", # yi
"h.{bid}.ln_1", # gpt2
"transformer.h.{bid}.ln", # phi2
"model.layers.layers.{bid}.norm", # plamo
"model.layers.{bid}.attention_norm", # internlm2
"model.layers.{bid}.norm", # mamba-qbert
"backbone.layers.{bid}.norm", # mamba
"transformer.decoder_layer.{bid}.rms_norm", # Grok
"transformer.blocks.{bid}.norm_attn_norm.norm_1.weight", # dbrx
),

# Attention norm 2
MODEL_TENSOR.ATTN_NORM_2: (
"transformer.h.{bid}.ln_attn", # falcon40b
"transformer.h.{bid}.ln_attn", # falcon40b
"transformer.blocks.{bid}.norm_attn_norm.norm_2.weight", # dbrx
),

# Attention query-key-value
MODEL_TENSOR.ATTN_QKV: (
"gpt_neox.layers.{bid}.attention.query_key_value", # gptneox
"transformer.h.{bid}.attn.c_attn", # gpt2 qwen
"transformer.blocks.{bid}.attn.Wqkv", # mpt
"transformer.blocks.{bid}.norm_attn_norm.attn.Wqkv.weight", # dbrx
"transformer.h.{bid}.self_attention.query_key_value", # falcon
"h.{bid}.self_attention.query_key_value", # bloom
"language_model.encoder.layers.{bid}.self_attention.query_key_value", # persimmon
Expand Down Expand Up @@ -202,9 +208,10 @@ class TensorNameMap:
),

MODEL_TENSOR.FFN_GATE_INP: (
"layers.{bid}.feed_forward.gate", # mixtral
"model.layers.{bid}.block_sparse_moe.gate", # mixtral
"transformer.decoder_layer.{bid}.router" # Grok
"layers.{bid}.feed_forward.gate", # mixtral
"model.layers.{bid}.block_sparse_moe.gate", # mixtral
"transformer.decoder_layer.{bid}.router", # Grok
"transformer.blocks.{bid}.ffn.router.layer.weight", # dbrx
),

# Feed-forward up
Expand Down Expand Up @@ -233,6 +240,7 @@ class TensorNameMap:
MODEL_TENSOR.FFN_UP_EXP: (
"layers.{bid}.feed_forward.experts.w3", # mixtral (merged)
"transformer.decoder_layer.{bid}.moe.linear_v", # Grok (merged)
"transformer.blocks.{bid}.ffn.experts.mlp.w1", # dbrx
phymbert marked this conversation as resolved.
Show resolved Hide resolved
),

# AWQ-activation gate
Expand All @@ -251,8 +259,9 @@ class TensorNameMap:
),

MODEL_TENSOR.FFN_GATE_EXP: (
"layers.{bid}.feed_forward.experts.w1", # mixtral (merged)
"transformer.decoder_layer.{bid}.moe.linear" # Grok (merged)
"layers.{bid}.feed_forward.experts.w1", # mixtral (merged)
"transformer.decoder_layer.{bid}.moe.linear", # Grok (merged)
"transformer.blocks.{bid}.ffn.experts.mlp.v1", # dbrx
phymbert marked this conversation as resolved.
Show resolved Hide resolved
),

# Feed-forward down
Expand Down Expand Up @@ -280,6 +289,7 @@ class TensorNameMap:
MODEL_TENSOR.FFN_DOWN_EXP: (
"layers.{bid}.feed_forward.experts.w2", # mixtral (merged)
"transformer.decoder_layer.{bid}.moe.linear_1", # Grok (merged)
"transformer.blocks.{bid}.ffn.experts.mlp.w2", # dbrx
),

MODEL_TENSOR.ATTN_Q_NORM: (
Expand All @@ -299,9 +309,10 @@ class TensorNameMap:
),

MODEL_TENSOR.LAYER_OUT_NORM: (
"encoder.layer.{bid}.output.LayerNorm", # bert
"encoder.layers.{bid}.norm2", # nomic-bert
"transformer.decoder_layer.{bid}.rms_norm_3", # Grok
"encoder.layer.{bid}.output.LayerNorm", # bert
"encoder.layers.{bid}.norm2", # nomic-bert
"transformer.decoder_layer.{bid}.rms_norm_3", # Grok
"transformer.blocks.{bid}.norm_attn_norm.attn.out_proj.weight", # dbrx
),

MODEL_TENSOR.SSM_IN: (
Expand Down
Loading
Loading