You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The transformers implementation of llama has the option to support tying the input word embeddings to the output layer to share the weights. The request here is to add support for that feature in torchchat.
Alternatives
Models that require tied embeddings could be converted to duplicate the embedding tensor in the conversion process.
Additional context
This issue is a piece of the puzzle for adding support for Granite Code 3b/8b which use the llama architecture in transormers, but take advantage several pieces of the architecture that are not currently supported by torchchat. The work-in-progress for Granite Code can be found on my fork: https://github.com/gabe-l-hart/torchchat/tree/GraniteCodeSupport.
RFC (Optional)
I have a working implementation of this that I plan to put up as a pull request. The changes are roughly:
Add tie_word_embeddings to TransformerArgs
Copy tok_embeddings.weight to model.output.weight in a load_hook in the Transformer module
The text was updated successfully, but these errors were encountered:
🚀 The feature, motivation and pitch
The
transformers
implementation of llama has the option to support tying the input word embeddings to the output layer to share the weights. The request here is to add support for that feature intorchchat
.Alternatives
Models that require tied embeddings could be converted to duplicate the embedding tensor in the conversion process.
Additional context
This issue is a piece of the puzzle for adding support for Granite Code 3b/8b which use the llama architecture in transormers, but take advantage several pieces of the architecture that are not currently supported by torchchat. The work-in-progress for Granite Code can be found on my fork: https://github.com/gabe-l-hart/torchchat/tree/GraniteCodeSupport.
RFC (Optional)
I have a working implementation of this that I plan to put up as a pull request. The changes are roughly:
tie_word_embeddings
toTransformerArgs
tok_embeddings.weight
tomodel.output.weight
in aload_hook
in theTransformer
moduleThe text was updated successfully, but these errors were encountered: