Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for tied word embeddings #1252

Closed
gabe-l-hart opened this issue Oct 1, 2024 · 2 comments · Fixed by #1260
Closed

Add support for tied word embeddings #1252

gabe-l-hart opened this issue Oct 1, 2024 · 2 comments · Fixed by #1260

Comments

@gabe-l-hart
Copy link
Contributor

🚀 The feature, motivation and pitch

The transformers implementation of llama has the option to support tying the input word embeddings to the output layer to share the weights. The request here is to add support for that feature in torchchat.

Alternatives

Models that require tied embeddings could be converted to duplicate the embedding tensor in the conversion process.

Additional context

This issue is a piece of the puzzle for adding support for Granite Code 3b/8b which use the llama architecture in transormers, but take advantage several pieces of the architecture that are not currently supported by torchchat. The work-in-progress for Granite Code can be found on my fork: https://github.com/gabe-l-hart/torchchat/tree/GraniteCodeSupport.

RFC (Optional)

I have a working implementation of this that I plan to put up as a pull request. The changes are roughly:

  • Add tie_word_embeddings to TransformerArgs
  • Copy tok_embeddings.weight to model.output.weight in a load_hook in the Transformer module
@Jack-Khuu
Copy link
Contributor

Amazing, send it out and I'll take a look

@gabe-l-hart gabe-l-hart mentioned this issue Oct 3, 2024
2 tasks
@gabe-l-hart
Copy link
Contributor Author

Draft PR up: #1260

Similar to the others, this is sequenced with Safetensors and Bias Tensors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants