Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert T5 models to Long T5 #2

Open
dafraile opened this issue Oct 31, 2022 · 2 comments
Open

Convert T5 models to Long T5 #2

dafraile opened this issue Oct 31, 2022 · 2 comments

Comments

@dafraile
Copy link

dafraile commented Oct 31, 2022

Hi, thanks for creating this script, amazing work! I was wondering if you have any plans in creating a convert script for T5 based models, or if you think there are any major difficulties when converting T5 models compared to other architectures.

Thanks,

David

@ccdv-ai
Copy link
Owner

ccdv-ai commented Oct 31, 2022

hi @dafraile

T5 is planned somehow, but there are some caveats:

  • T5 relies on a relative positional embedding. It is added to the attention score matrix directly, so you have to compute both Q @ K.T and a relative positional score matrix which is inefficient for very long sequences. This is not the case for BART/Pegasus models.
  • While relative positional score matrix is not that difficult to compute for local attention, it is not compatible with most LSG sparse attention patterns. There is also no specific rules for global tokens that are prepended.
  • I'd say that LSG-T5 is much more difficult to build because I have to rethink some things specifically for this model.
  • If you really need to use T5 right now, there is the LongT5 model on HuggingFace, it is somehow similar to LSG but it is less efficient. It is retrained from scratch, so it is not based on an existing "short" T5 checkpoint.

@dafraile
Copy link
Author

Thank you! That's very enlightening. Yes, I guess for now using the existing LongT5 and retraining on top of it is the only viable option, instead of using already trained T5s and converting them into LSG.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants