-
Notifications
You must be signed in to change notification settings - Fork 26.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigBird #10183
BigBird #10183
Conversation
src/transformers/models/big_bird/convert_bigbird_original_tf_checkpoint_to_pytorch.py
Outdated
Show resolved
Hide resolved
…tion ; till now everything working :)
Will BigBird-Pegasus be added, and then |
Yes, we will be adding that soon.
|
Once pre-trained checkpoints are uploaded to from transformers import BigBirdForMaskedLM, BigBirdForPreTraining, BigBirdTokenizer
tokenizer = BigBirdTokenizer.from_pretrained("google/bigbird-roberta-base")
# model with LM head
model_with_lm = BigBirdForMaskedLM.from_pretrained("google/bigbird-roberta-base")
# model with pertaining heads
model_for_pretraining = BigBirdForPreTraining.from_pretrained("google/bigbird-roberta-base") |
…tn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements
…into add_big_bird
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing add! This is a big model and will make for a nice addition. I have left quite a few comments for styling mainly.
On top of that, don't forget to add your model to the main README!
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made typos in my suggestions, sorry!
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great @vasudevgupta7! I've left a few comments, mostly nits.
This made me think we should really push for fast tokenizers in the templates, as they're arguably more important and useful than their python counterparts.
Thanks a lot for working on this @vasudevgupta7, this is a tremendous effort!
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
@sgugger, @LysandreJik I updated the code based on your suggestions. Please let me know if I have missed something. |
Thank you for taking care of the comments @vasudevgupta7 and for this PR altogether! |
@vasudevgupta7 great work, when are you planning to add the BigBirdForConditionalGeneration? And any plans on adding the pubmed pre-trained models? |
@sayakmisra I am currently working on it. You can track PR #10991. |
@vasudevgupta7 currently loading Can we have separate pretrained checkpoints for BigBird and Pegasus without the finetuning, so that we can use the Pegasus decoder along with the BigBird encoder in our code? |
Hey @jigsaw2212, we are still working on integrating |
* init bigbird * model.__init__ working, conversion script ready, config updated * add conversion script * BigBirdEmbeddings working :) * slightly update conversion script * BigBirdAttention working :) ; some bug in layer.output.dense * add debugger-notebook * forward() working for BigBirdModel :) ; replaced gelu with gelu_fast * tf code adapted to torch till rand_attn in bigbird_block_sparse_attention ; till now everything working :) * BigBirdModel working in block-sparse attention mode :) * add BigBirdForPreTraining * small fix * add tokenizer for BigBirdModel * fix config & hence modeling * fix base prefix * init testing * init tokenizer test * pos_embed must be absolute, attn_type=original_full when add_cross_attn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements * remove position_embedding_type arg * complete normal tests * add comments to block sparse attention * add attn_probs for sliding & global tokens * create fn for block sparse attn mask creation * add special tests * restore pos embed arg * minor fix * attn probs update * make big bird fully gpu friendly * fix tests * remove pruning * correct tokenzier & minor fixes * update conversion script , remove norm_type * tokenizer-inference test add * remove extra comments * add docs * save intermediate * finish trivia_qa conversion * small update to forward * correct qa and layer * better error message * BigBird QA ready * fix rebased * add triva-qa debugger notebook * qa setup * fixed till embeddings * some issue in q/k/v_layer * fix bug in conversion-script * fixed till self-attn * qa fixed except layer norm * add qa end2end test * fix gradient ckpting ; other qa test * speed-up big bird a bit * hub_id=google * clean up * make quality * speed up einsum with bmm * finish perf improvements for big bird * remove wav2vec2 tok * fix tokenizer * include docs * correct docs * add helper to auto pad block size * make style * remove fast tokenizer for now * fix some * add pad test * finish * fix some bugs * fix another bug * fix buffer tokens * fix comment and merge from master * add comments * make style * commit some suggestions Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix typos * fix some more suggestions * add another patch Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix copies * another path Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * update * update nit suggestions * make style Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
What does this PR do?
This PR will add Google's BigBird "Roberta".
Fixes #6113.
This PR adds three checkpoints of BigBird:
Here a notebook showing how well BigBird works on long-document question answering: https://colab.research.google.com/drive/1DVOm1VHjW0eKCayFq1N2GpY6GR9M4tJP?usp=sharing
Before submitting
Who can review?
Anyone in the community is free to review the PR once the tests have passed.
@patrickvonplaten