Convert T5 models to Long T5 #2

dafraile · 2022-10-31T00:23:55Z

Hi, thanks for creating this script, amazing work! I was wondering if you have any plans in creating a convert script for T5 based models, or if you think there are any major difficulties when converting T5 models compared to other architectures.

Thanks,

David

ccdv-ai · 2022-10-31T10:05:02Z

hi @dafraile

T5 is planned somehow, but there are some caveats:

T5 relies on a relative positional embedding. It is added to the attention score matrix directly, so you have to compute both Q @ K.T and a relative positional score matrix which is inefficient for very long sequences. This is not the case for BART/Pegasus models.
While relative positional score matrix is not that difficult to compute for local attention, it is not compatible with most LSG sparse attention patterns. There is also no specific rules for global tokens that are prepended.
I'd say that LSG-T5 is much more difficult to build because I have to rethink some things specifically for this model.
If you really need to use T5 right now, there is the LongT5 model on HuggingFace, it is somehow similar to LSG but it is less efficient. It is retrained from scratch, so it is not based on an existing "short" T5 checkpoint.

dafraile · 2022-10-31T23:18:39Z

Thank you! That's very enlightening. Yes, I guess for now using the existing LongT5 and retraining on top of it is the only viable option, instead of using already trained T5s and converting them into LSG.

Cheers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert T5 models to Long T5 #2

Convert T5 models to Long T5 #2

dafraile commented Oct 31, 2022 •

edited

Loading

ccdv-ai commented Oct 31, 2022

dafraile commented Oct 31, 2022

Convert T5 models to Long T5 #2

Convert T5 models to Long T5 #2

Comments

dafraile commented Oct 31, 2022 • edited Loading

ccdv-ai commented Oct 31, 2022

dafraile commented Oct 31, 2022

dafraile commented Oct 31, 2022 •

edited

Loading