Adding support for Mistral-Nemo-Instruct #8576

handshape · 2024-07-18T18:36:40Z

handshape
Jul 18, 2024

Per the ticket template for feature requests, how does the community feel about adding support for Mistral-Nemo-Instruct?
On its face, it looks like there's a need to add support for a pre-tokenizer type called mistral-bpe.

iamlemec · 2024-07-18T20:25:20Z

iamlemec
Jul 18, 2024
Collaborator

I just added in the pre-tokenizer quickly and converted. But on running, you get errors related to incorrect tensor shapes. Namely, the embedding is n_embd = 5120, but this does not split evenly into the Q and K/V sizes like usual. So the attn_q.weight tensor is actually 4096 x 5120 instead of 5120 x 5120 and so on.

From the look of it, only Gemma and Gemma2 currently do something similar.

0 replies

handshape · 2024-07-19T09:28:33Z

handshape
Jul 19, 2024
Author

Looks like work is underway.
#8577

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for Mistral-Nemo-Instruct #8576

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Adding support for Mistral-Nemo-Instruct #8576

handshape Jul 18, 2024

Replies: 2 comments

iamlemec Jul 18, 2024 Collaborator

handshape Jul 19, 2024 Author

handshape
Jul 18, 2024

iamlemec
Jul 18, 2024
Collaborator

handshape
Jul 19, 2024
Author