Add support for tied word embeddings #1252

gabe-l-hart · 2024-10-01T22:41:27Z

🚀 The feature, motivation and pitch

The transformers implementation of llama has the option to support tying the input word embeddings to the output layer to share the weights. The request here is to add support for that feature in torchchat.

Alternatives

Models that require tied embeddings could be converted to duplicate the embedding tensor in the conversion process.

Additional context

This issue is a piece of the puzzle for adding support for Granite Code 3b/8b which use the llama architecture in transormers, but take advantage several pieces of the architecture that are not currently supported by torchchat. The work-in-progress for Granite Code can be found on my fork: https://github.com/gabe-l-hart/torchchat/tree/GraniteCodeSupport.

RFC (Optional)

I have a working implementation of this that I plan to put up as a pull request. The changes are roughly:

Add tie_word_embeddings to TransformerArgs
Copy tok_embeddings.weight to model.output.weight in a load_hook in the Transformer module

The text was updated successfully, but these errors were encountered:

Jack-Khuu · 2024-10-02T23:22:35Z

Amazing, send it out and I'll take a look

gabe-l-hart · 2024-10-03T15:41:11Z

Draft PR up: #1260

Similar to the others, this is sequenced with Safetensors and Bias Tensors.

gabe-l-hart mentioned this issue Oct 3, 2024

Tied word embeddings #1260

Merged

2 tasks

gabe-l-hart mentioned this issue Oct 3, 2024

Support Granite Code 3B/8B #1262

Closed

Jack-Khuu closed this as completed in #1260 Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for tied word embeddings #1252

Add support for tied word embeddings #1252

gabe-l-hart commented Oct 1, 2024

Jack-Khuu commented Oct 2, 2024

gabe-l-hart commented Oct 3, 2024

Add support for tied word embeddings #1252

Add support for tied word embeddings #1252

Comments

gabe-l-hart commented Oct 1, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Jack-Khuu commented Oct 2, 2024

gabe-l-hart commented Oct 3, 2024