-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support tie_word_embeddings #263
Conversation
Hey @grzuy :) This is more of a workaround that works for loading, but in order for parameters sharing to work during model training we actually need to share them at Axon level. Since PyTorch checkpoints generally include the tensors for both shared layers, we've been deferring this until we address the actual sharing. There is |
Right, hehe 🤦♂️ Thanks! |
No worries :) If this was needed to load models it could be worth the workaround (we actually had to do it for T5), but I think it's fine to default to |
So, this is clearly not actaully supporting tie_word_embedding, I do now realize :-) Despite that, wouldn't these changes still be valuable under a title like "Support loading param files without repeated shared params" or similar? I might be missing something else here maybe. |
We almost posted comment at the same moment 😄 |
It indeed fixed loading of a few |
Are these models you saved? Does it work if you save in the |
Yes, it fixed loading
As far as I tested it did load the same LM head params for |
@jonatanklosko Is it accurate to say then that fine-tuning any HF model that has param sharing in it's architecture like these with tie_word_embeddings=True (BERT, Bart, alBERT) will not work for Bumblebee as of now, right? |
@grzuy yeah, if you train with Axon then the weights are not going to be shared and they would likely diverge to some extent. I don't now how much impact it has for fine-tuning (as opposed to training from scratch). That's precisely why we need actual sharing. |
Is something along these lines what's needed to support tie_word_embeddings for models with language modeling heads?
If so, I can continue updating the rest of the missing models.