Support tie_word_embeddings #263

grzuy · 2023-10-16T20:15:01Z

Is something along these lines what's needed to support tie_word_embeddings for models with language modeling heads?

If so, I can continue updating the rest of the missing models.

jonatanklosko · 2023-10-17T18:26:48Z

Hey @grzuy :) This is more of a workaround that works for loading, but in order for parameters sharing to work during model training we actually need to share them at Axon level. Since PyTorch checkpoints generally include the tensors for both shared layers, we've been deferring this until we address the actual sharing.

There is Axon.block on Axon main (PR), but I think we still haven't converged on what is the best solution in the Bumblebee case. cc @seanmor5

grzuy · 2023-10-18T14:33:05Z

Right, hehe 🤦‍♂️

Thanks!

jonatanklosko · 2023-10-18T17:30:45Z

No worries :) If this was needed to load models it could be worth the workaround (we actually had to do it for T5), but I think it's fine to default to .bin parameters until we do the actual sharing, because we would need to rewrite the workaround anyway :)

grzuy · 2023-10-18T17:30:57Z

So, this is clearly not actaully supporting tie_word_embedding, I do now realize :-)

Despite that, wouldn't these changes still be valuable under a title like "Support loading param files without repeated shared params" or similar?

I might be missing something else here maybe.

grzuy · 2023-10-18T17:31:30Z

We almost posted comment at the same moment 😄

grzuy · 2023-10-18T17:32:54Z

If this was needed to load models it could be worth the workaround

It indeed fixed loading of a few model.safetensors file for me locally.

jonatanklosko · 2023-10-18T17:37:19Z

It indeed fixed loading of a few model.safetensors file for me locally.

Are these models you saved? Does it work if you save in the .bin format?

grzuy · 2023-10-18T18:05:20Z

It indeed fixed loading of a few model.safetensors file for me locally.

Are these models you saved?

Yes, it fixed loading model.safetensors for the models I updated in this PR.

Does it work if you save in the .bin format?

As far as I tested it did load the same LM head params for .bin files.

grzuy · 2023-10-24T19:19:09Z

Hey @grzuy :) This is more of a workaround that works for loading, but in order for parameters sharing to work during model training we actually need to share them at Axon level. Since PyTorch checkpoints generally include the tensors for both shared layers, we've been deferring this until we address the actual sharing.

@jonatanklosko Is it accurate to say then that fine-tuning any HF model that has param sharing in it's architecture like these with tie_word_embeddings=True (BERT, Bart, alBERT) will not work for Bumblebee as of now, right?

jonatanklosko · 2023-10-24T19:25:01Z

@grzuy yeah, if you train with Axon then the weights are not going to be shared and they would likely diverge to some extent. I don't now how much impact it has for fine-tuning (as opposed to training from scratch). That's precisely why we need actual sharing.

Support tie_word_embeddings

0ddc826

grzuy mentioned this pull request Oct 16, 2023

Improve loading errors and add docs on HF repos #256

Merged

mix format

3b2f7a0

grzuy closed this Oct 18, 2023

jonatanklosko mentioned this pull request Feb 21, 2024

Tied word embeddings #339

Open

grzuy deleted the tie-word-embeddings branch February 29, 2024 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support tie_word_embeddings #263

Support tie_word_embeddings #263

grzuy commented Oct 16, 2023 •

edited

Loading

jonatanklosko commented Oct 17, 2023 •

edited

Loading

grzuy commented Oct 18, 2023

jonatanklosko commented Oct 18, 2023

grzuy commented Oct 18, 2023 •

edited

Loading

grzuy commented Oct 18, 2023

grzuy commented Oct 18, 2023

jonatanklosko commented Oct 18, 2023 •

edited

Loading

grzuy commented Oct 18, 2023

grzuy commented Oct 24, 2023

jonatanklosko commented Oct 24, 2023

Support tie_word_embeddings #263

Support tie_word_embeddings #263

Conversation

grzuy commented Oct 16, 2023 • edited Loading

jonatanklosko commented Oct 17, 2023 • edited Loading

grzuy commented Oct 18, 2023

jonatanklosko commented Oct 18, 2023

grzuy commented Oct 18, 2023 • edited Loading

grzuy commented Oct 18, 2023

grzuy commented Oct 18, 2023

jonatanklosko commented Oct 18, 2023 • edited Loading

grzuy commented Oct 18, 2023

grzuy commented Oct 24, 2023

jonatanklosko commented Oct 24, 2023

grzuy commented Oct 16, 2023 •

edited

Loading

jonatanklosko commented Oct 17, 2023 •

edited

Loading

grzuy commented Oct 18, 2023 •

edited

Loading

jonatanklosko commented Oct 18, 2023 •

edited

Loading