Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RemBERT model code to huggingface #10692

Merged
merged 808 commits into from
Jul 24, 2021
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
808 commits
Select commit Hold shift + click to select a range
a7b9b13
Faster list concat for trainer_pt_utils.get_length_grouped_indices() …
May 22, 2021
ec617be
Replace double occurrences as the last step (#11367)
LysandreJik May 24, 2021
d5c7461
[Flax] Fix PyTorch import error (#11839)
patrickvonplaten May 24, 2021
a190a77
Fix reference to XLNet (#11846)
sgugger May 24, 2021
a3595c2
Switch mem metrics flag (#11851)
sgugger May 24, 2021
8d4ec6b
Fix flos single node (#11844)
TevenLeScao May 24, 2021
c6bba66
Fix two typos in docs (#11852)
nickls May 24, 2021
4eb60bf
[Trainer] Report both steps and num samples per second (#11818)
sgugger May 24, 2021
447e922
Add some tests to the slow suite #11860
LysandreJik May 25, 2021
9bed4c9
Enable memory metrics in tests that need it (#11859)
LysandreJik May 25, 2021
3437b3d
fixed a small typo in the doc (#11856)
stsuchi May 25, 2021
9318761
typo (#11858)
WrRan May 25, 2021
622df91
Add option to log only once in multinode training (#11819)
sgugger May 25, 2021
574090d
[Wav2Vec2] SpecAugment Fast (#11764)
patrickvonplaten May 25, 2021
97346af
[lm examples] fix overflow in perplexity calc (#11855)
stas00 May 25, 2021
c026fd9
[Examples] create model with custom config on the fly (#11798)
stas00 May 25, 2021
0ede44b
[Wav2Vec2ForCTC] example typo fixed (#11878)
madprogramer May 25, 2021
161a341
Ensure input tensor are on device. (#11874)
francescorubbo May 26, 2021
93d6c1a
Fix usage of head masks by TF encoder-decoder models' `generate()` fu…
stancld May 26, 2021
2cd5e54
Correcting comments in T5Stack to reflect correct tuple order (#11330)
talkhaldi May 26, 2021
cc12c13
[Flax] Allow dataclasses to be jitted (#11886)
patrickvonplaten May 26, 2021
b0a0110
changing find_batch_size to work with tokenizer outputs (#11890)
joerenner May 26, 2021
c58cda0
Link official Cloud TPU JAX docs (#11892)
avital May 26, 2021
c91ad78
Flax Generate (#11777)
patrickvonplaten May 26, 2021
b3a93de
Add Emotion Speech Noteboook (#11900)
patrickvonplaten May 27, 2021
63a48a5
Update deepspeed config to reflect hyperparameter search parameters (…
Mindful May 27, 2021
8f04997
Adding new argument `max_new_tokens` for generate. (#11476)
Narsil May 27, 2021
fc1d796
Added Sequence Classification class in GPTNeo (#11906)
bhadreshpsavani May 28, 2021
a6c47d8
[Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 (#11918)
jayendra13 May 28, 2021
3a475ec
Test optuna and ray (#11924)
LysandreJik May 28, 2021
5bd46d3
Remove `datasets` submodule
LysandreJik May 31, 2021
885b8ae
fix assert (#11935)
PhilipMay May 31, 2021
d263e1c
Remove redundant `nn.log_softmax` in `run_flax_glue.py` (#11920)
n2cholas May 31, 2021
e2f6708
Add MT5ForConditionalGeneration as supported arch. to summarization R…
PhilipMay May 31, 2021
da6f9a1
Add FlaxCLIP (#11883)
patil-suraj Jun 1, 2021
63a6e82
RAG-2nd2end-revamp (#11893)
Jun 1, 2021
199f4cb
modify qa-trainer (#11872)
zhangfanTJU Jun 1, 2021
cf241f7
bugfixes training_args.py (#11922)
BassaniRiccardo Jun 1, 2021
02fcbfa
reinitialize wandb config for each hyperparameter search run (#11945)
Mindful Jun 1, 2021
d1db21c
Add regression tests for slow sentencepiece tokenizers. (#11737)
PhilipMay Jun 1, 2021
a323939
Authorize args when instantiating an AutoModel (#11956)
LysandreJik Jun 1, 2021
8e423cf
Neptune.ai integration (#11937)
vbyno Jun 1, 2021
5acefff
Run the integration tests on schedule tests instead of master tests
LysandreJik Jun 1, 2021
d5a4c8c
[deepspeed] docs (#11940)
stas00 Jun 1, 2021
52fa0d9
typo correction (#11973)
JminJ Jun 1, 2021
5e03891
ByT5 model (#11971)
patrickvonplaten Jun 1, 2021
12884e6
Typo in usage example, changed to device instead of torch_device (#11…
albertovilla Jun 1, 2021
11c2e6e
[DeepSpeed] decouple `DeepSpeedConfigHF` from `Trainer` (#11966)
stas00 Jun 1, 2021
dd93baf
[Trainer] add train loss and flops metrics reports (#11980)
stas00 Jun 1, 2021
4394a7a
Bump urllib3 from 1.25.8 to 1.26.5 in /examples/research_projects/lxm…
dependabot[bot] Jun 2, 2021
a88552a
[RAG] Fix rag from pretrained question encoder generator behavior (#1…
patrickvonplaten Jun 2, 2021
40018ed
VisualBERT (#10534)
gchhablani Jun 2, 2021
6384ddc
Fix examples (#11990)
gchhablani Jun 2, 2021
4501d9c
[docs] fix xref to `PreTrainedModel.generate` (#11049)
stas00 Jun 2, 2021
e5b00f7
Update return introduction (#11976)
kouyk Jun 2, 2021
72d77ba
[deepspeed] Move code and doc into standalone files (#11984)
stas00 Jun 2, 2021
812af25
[deepspeed] add nvme test skip rule (#11997)
stas00 Jun 2, 2021
ca9ce8b
Fix weight decay masking in `run_flax_glue.py` (#11964)
n2cholas Jun 3, 2021
f22ecb8
[Flax] Refactor MLM (#12013)
patrickvonplaten Jun 3, 2021
92f9ae7
[Deepspeed] Assert on mismatches between ds and hf args (#12021)
stas00 Jun 4, 2021
bda3bd5
[TrainerArguments] format and sort __repr__, add __str__ (#12018)
stas00 Jun 4, 2021
0dbd3aa
Fixed Typo in modeling_bart.py (#12035)
ceevaaa Jun 7, 2021
8491629
fix deberta 2 tokenizer integration test (#12017)
PhilipMay Jun 7, 2021
7314f1d
fix docs of past_key_values (#12049)
patil-suraj Jun 7, 2021
11e6b72
[JAX] Bump jax lib (#12053)
patrickvonplaten Jun 7, 2021
56d1c4e
Fixes bug that appears when using QA bert and distilation. (#12026)
madlag Jun 7, 2021
9f14798
Extend pipelines for automodel tupels (#12025)
Narsil Jun 7, 2021
159421c
Add optional grouped parsers description to HfArgumentParser (#12042)
peteriz Jun 7, 2021
94c9699
adds metric prefix. (#12057)
riklopfer Jun 8, 2021
2626916
skip failing test (#12059)
stas00 Jun 8, 2021
6875b3b
Fix integration tests (#12066)
NielsRogge Jun 8, 2021
312e9a0
Fix tapas issue (#12063)
NielsRogge Jun 8, 2021
65f0175
updated the original RAG implementation to be compatible with latest …
Jun 8, 2021
3c91acb
Replace legacy tensor.Tensor with torch.tensor/torch.empty (#12027)
mariosasko Jun 8, 2021
a7f6787
Add torch to requirements.txt in language-modeling (#12040)
cdleong Jun 8, 2021
bbb0d71
Properly indent block_size (#12070)
sgugger Jun 8, 2021
d59f119
[Deepspeed] various fixes (#12058)
stas00 Jun 8, 2021
078cef3
[Deepspeed Wav2vec2] integration (#11638)
stas00 Jun 8, 2021
b5c0091
typo
stas00 Jun 8, 2021
2a5be37
Update run_ner.py with id2label config (#12001)
KoichiYasuoka Jun 9, 2021
127159d
sync LayerDrop for Wav2Vec2Encoder + tests (#12076)
stas00 Jun 9, 2021
e87758c
Add DETR (#11653)
NielsRogge Jun 9, 2021
7507066
[test] support more than 2 gpus (#12074)
stas00 Jun 9, 2021
5de26dc
Wav2Vec2 Pretraining (#11306)
anton-l Jun 9, 2021
5d37376
pass decay_mask fn to optimizer (#12087)
patil-suraj Jun 9, 2021
4bc6fb8
rm require_version_examples (#12088)
stas00 Jun 9, 2021
56bb887
[Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests (#1…
patrickvonplaten Jun 9, 2021
7ba3563
Add text_column_name and label_column_name to run_ner and run_ner_no_…
kumapo Jun 10, 2021
5a5f21a
CLIPFeatureExtractor should resize images with kept aspect ratio (#11…
TobiasNorlund Jun 10, 2021
e1fe741
New TF GLUE example (#12028)
Rocketknight1 Jun 10, 2021
7a467b9
Fix quality
sgugger Jun 10, 2021
644bd8f
Update README.md to cover the TF GLUE example.
Rocketknight1 Jun 10, 2021
645af63
Minor style edits
Rocketknight1 Jun 10, 2021
0261ab7
Appending label2id and id2label to models to ensure inference works p…
Rocketknight1 Jun 10, 2021
31e2d1c
Fix a condition in test_generate_with_head_masking (#11911)
stancld Jun 10, 2021
382332b
Flax VisionTransformer (#11951)
jayendra13 Jun 10, 2021
87af07d
add relevant description to tqdm in examples (#11927)
bhavitvyamalik Jun 10, 2021
1251f2f
Fix head masking generate tests (#12110)
patrickvonplaten Jun 11, 2021
ae6611b
Flax CLM script (#12023)
patil-suraj Jun 11, 2021
1848cd5
Add from_pretrained to dummy timm objects (#12097)
LysandreJik Jun 11, 2021
155c40a
Fix t5 error message (#12136)
cccntu Jun 13, 2021
85fc934
Fix megatron_gpt2 attention block's causal mask (#12007)
novatig Jun 14, 2021
a8c5b66
Add mlm pretraining xla torch readme (#12011)
patrickvonplaten Jun 14, 2021
364bbc6
add readme for flax clm (#12111)
patil-suraj Jun 14, 2021
03c24ef
FlaxBart (#11537)
stancld Jun 14, 2021
e066783
Feature to use the PreTrainedTokenizerFast class as a stand-alone tok…
SaulLu Jun 14, 2021
69c86e1
[Flax] Add links to google colabs (#12146)
patrickvonplaten Jun 14, 2021
b7b5f7a
Don't log anything before logging is setup in examples (#12121)
sgugger Jun 14, 2021
d4a895c
Use text_column_name variable instead of "text" (#12132)
nbroad1881 Jun 14, 2021
a771916
[lm examples] Replicate --config_overrides addition to other LM examp…
kumar-abhishek Jun 14, 2021
ae403bc
fix error message (#12148)
patil-suraj Jun 14, 2021
8121989
[optim] implement AdafactorSchedule (#12123)
stas00 Jun 14, 2021
08cb69c
[style] consistent nn. and nn.functional (#12124)
stas00 Jun 14, 2021
ed66172
Adding TFWav2Vec2Model (#11617)
will-rice Jun 14, 2021
4ab85f9
[Flax] Fix flax pt equivalence tests (#12154)
patrickvonplaten Jun 14, 2021
cf988f8
consistent nn. and nn.functional: p2 templates (#12153)
stas00 Jun 14, 2021
c8527ad
Flax Big Bird (#11967)
thevasudevgupta Jun 14, 2021
3fe3975
[style] consistent nn. and nn.functional: part 3 `tests` (#12155)
stas00 Jun 14, 2021
05f27a9
[style] consistent nn. and nn.functional: part 4 `examples` (#12156)
stas00 Jun 14, 2021
b6bec40
consistent nn. and nn.functional: part 5 docs (#12161)
stas00 Jun 14, 2021
d90660a
Add video links to the documentation (#12162)
sgugger Jun 15, 2021
930e327
[Flax generate] Add params to generate (#12171)
patrickvonplaten Jun 15, 2021
04352f0
Use a released version of optax rather than installing from Git. (#12…
avital Jun 15, 2021
a573d9f
Have dummy processors have a `from_pretrained` method (#12145)
LysandreJik Jun 15, 2021
d017be3
Add course banner (#12157)
sgugger Jun 15, 2021
7854c04
Adjust banner width
sgugger Jun 15, 2021
c5bfb2f
Enable add_prefix_space if model_type is roberta or gpt2 (#12116)
kumapo Jun 15, 2021
e42baf3
Update AutoModel classes in summarization example (#12178)
ionicsolutions Jun 15, 2021
029a85e
Ray Tune Integration Updates (#12134)
amogkam Jun 15, 2021
dea8c90
[testing] ensure concurrent pytest workers use a unique port for torc…
stas00 Jun 15, 2021
b64fac1
Model card defaults (#12122)
sgugger Jun 15, 2021
b8c4503
Temporarily deactivate torch-scatter while we wait for new release (#…
LysandreJik Jun 15, 2021
50d143d
Temporarily deactivate torchhub test (#12184)
sgugger Jun 15, 2021
7d53e72
[Flax] Add Beam Search (#12131)
patrickvonplaten Jun 16, 2021
0310ca5
Hubert (#11889)
patrickvonplaten Jun 16, 2021
28ae2aa
updated DLC images and sample notebooks (#12191)
philschmid Jun 16, 2021
6d59345
Enabling AutoTokenizer for HubertConfig. (#12198)
Narsil Jun 16, 2021
dd7662c
Use yaml to create metadata (#12185)
sgugger Jun 16, 2021
8bf85bd
[Docs] fixed broken link (#12205)
bhadreshpsavani Jun 16, 2021
c34c618
Pipeline update & tests (#12207)
LysandreJik Jun 17, 2021
faf6efc
Improve detr (#12147)
NielsRogge Jun 17, 2021
210cdf2
Add link to the course (#12229)
sgugger Jun 17, 2021
f1ebe97
Support for torch 1.9.0 (#12224)
LysandreJik Jun 17, 2021
74d50b8
fix pt-1.9.0 `add_` deprecation (#12217)
stas00 Jun 17, 2021
9d6a6d1
Release: v4.7.0
LysandreJik Jun 17, 2021
adfd2f7
Docs for v4.8.0
LysandreJik Jun 17, 2021
3f68940
AutoTokenizer: infer the class from the tokenizer config if possible …
sgugger Jun 17, 2021
4fa9427
update desc for map in all examples (#12226)
bhavitvyamalik Jun 17, 2021
5cc3440
[Flax] FlaxAutoModelForSeq2SeqLM (#12228)
patil-suraj Jun 18, 2021
547352d
[FlaxBart] few small fixes (#12247)
patil-suraj Jun 18, 2021
97126ee
Depreciate pythonic Mish and support PyTorch 1.9 version of Mish (#12…
digantamisra98 Jun 18, 2021
76e7d6f
[t5 doc] make the example work out of the box (#12239)
stas00 Jun 18, 2021
5241a38
Fix the scheduled CI
LysandreJik Jun 21, 2021
07da9ae
Better CI feedback (#12279)
LysandreJik Jun 21, 2021
6acffe3
Fix for making student ProphetNet for Seq2Seq Distillation (#12130)
vishal-burman Jun 21, 2021
9a76b30
[FlaxClip] fix test from/save pretrained test (#12284)
patil-suraj Jun 21, 2021
695c5da
[Flax] [WIP] allow loading head model with base model weights (#12255)
patil-suraj Jun 21, 2021
dab6216
[DeepSpeed] don't ignore --adafactor (#12257)
stas00 Jun 21, 2021
d139ed2
[Flax] Fix flax test save pretrained (#12256)
patrickvonplaten Jun 21, 2021
6423a48
Tensorflow QA example (#12252)
Rocketknight1 Jun 21, 2021
777425c
[Flax] Add jax flax to env command (#12251)
patrickvonplaten Jun 21, 2021
df70ca9
reset report_to to none, avoid deprecation warning (#12293)
stas00 Jun 21, 2021
cc421b7
[trainer + examples] set log level from CLI (#12276)
stas00 Jun 22, 2021
5d9893f
[tests] multiple improvements (#12294)
stas00 Jun 22, 2021
bd22fe4
Fix for the issue of device-id getting hardcoded for token_type_ids d…
HamidShojanazeri Jun 22, 2021
616b1a2
trainer_tf: adjust wandb installation command (#12291)
stefan-it Jun 22, 2021
4e6d9a0
add FlaxAutoModelForImageClassification in main init (#12298)
patil-suraj Jun 22, 2021
94498e4
Fix and improve documentation for LEDForConditionalGeneration (#12303)
ionicsolutions Jun 22, 2021
b799333
[Flax] Main doc for event orga (#12305)
patrickvonplaten Jun 22, 2021
4224aae
[trainer] 2 bug fixes and a rename (#12309)
stas00 Jun 22, 2021
9bc30c9
FlaxBartPretrainedModel -> FlaxBartPreTrainedModel (#12313)
sgugger Jun 22, 2021
6c70f69
[docs] performance (#12258)
stas00 Jun 22, 2021
578d266
Add CodeCarbon Integration (#12304)
JetRunner Jun 23, 2021
99eb719
Optimizing away the `fill-mask` pipeline. (#12113)
Narsil Jun 23, 2021
0a49163
Add output in a dictionary for TF `generate` method (#12139)
stancld Jun 23, 2021
77e4875
Flax summarization script (#12230)
patil-suraj Jun 23, 2021
3577ce8
Rewrite ProphetNet to adapt converting ONNX friendly (#11981)
jiafatom Jun 23, 2021
c48938e
Flax T5 (#12150)
thevasudevgupta Jun 23, 2021
fc91830
Add mention of the huggingface_hub methods for offline mode (#12320)
LysandreJik Jun 23, 2021
f34c057
[Flax/JAX] Add how to propose projects markdown (#12311)
patrickvonplaten Jun 23, 2021
f812662
[TFWav2Vec2] Fix docs (#12283)
Jun 23, 2021
cd5403b
Clean push to hub API (#12187)
sgugger Jun 23, 2021
aaa93cf
Add all XxxPreTrainedModel to the main init (#12314)
sgugger Jun 23, 2021
34ca862
Conda build (#12323)
LysandreJik Jun 23, 2021
7f381de
Temporarily revert the `fill-mask` improvements.
LysandreJik Jun 23, 2021
dc89956
changed modeling_fx_utils.py to utils/fx.py for clarity (#12326)
michaelbenayoun Jun 23, 2021
3148eeb
Pin good version of huggingface_hub
sgugger Jun 23, 2021
43bd62e
[Flax T5] Fix weight initialization and fix docs (#12327)
patrickvonplaten Jun 23, 2021
4415206
Release: v4.8.0
sgugger Jun 23, 2021
d27dd28
v4.9.0.dev0
sgugger Jun 23, 2021
4590da1
Update training_args.py (#12328)
sam-writer Jun 23, 2021
3282ebe
[Deepspeed] new docs (#12077)
stas00 Jun 23, 2021
2f14e1c
Fix default to logging_dir lost in merge conflict
sgugger Jun 23, 2021
53da33c
try-this (#12338)
richardliaw Jun 24, 2021
5203dc7
[examples/Flax] move the examples table up (#12341)
patil-suraj Jun 24, 2021
5eabae7
Fix torchscript tests (#12336)
LysandreJik Jun 24, 2021
8c4074a
Document patch release v4.8.1
sgugger Jun 24, 2021
0b10975
Add flax/jax quickstart (#12342)
marcvanzee Jun 24, 2021
49d9173
Update README.md
patrickvonplaten Jun 25, 2021
e6d7d32
fixed typo (#12356)
MichalPitr Jun 25, 2021
8dc123b
Fix exception in prediction loop occurring for certain batch sizes (#…
jglaser Jun 25, 2021
a1899a3
Add FlaxBigBird QuestionAnswering script (#12233)
thevasudevgupta Jun 25, 2021
5a3ced5
Replace NotebookProgressReporter by ProgressReporter in Ray Tune run …
krfricke Jun 25, 2021
6432468
Style
sgugger Jun 25, 2021
9af1491
remove extra white space from log format (#12360)
stas00 Jun 25, 2021
96421ae
fixed multiplechoice tokenization (#12362)
cronoik Jun 25, 2021
4e11c12
[trainer] add main_process_first context manager (#12351)
stas00 Jun 25, 2021
e566720
[Examples] Replicates the new --log_level feature to all trainer-base…
bhadreshpsavani Jun 25, 2021
1a873fe
updated example template (#12365)
bhadreshpsavani Jun 26, 2021
ddb9174
replace print with logger (#12368)
bhadreshpsavani Jun 26, 2021
46d0707
[Documentation] Warn that DataCollatorForWholeWordMask is limited to …
ionicsolutions Jun 28, 2021
9f617fd
Update run_mlm.py (#12344)
TahaAslani Jun 28, 2021
25df6c6
Add possibility to maintain full copies of files (#12312)
sgugger Jun 28, 2021
12cd817
[CI] add dependency table sync verification (#12364)
stas00 Jun 28, 2021
e2533e9
[Examples] Added context manager to datasets map (#12367)
bhadreshpsavani Jun 28, 2021
0907eb8
[Flax community event] Add more description to readme (#12398)
patrickvonplaten Jun 28, 2021
487cf4b
Update README.md
patrickvonplaten Jun 28, 2021
7789be2
Fix copies
sgugger Jun 28, 2021
fb9477e
Remove the need for `einsum` in Albert's attention computation (#12394)
mfuntowicz Jun 28, 2021
aeda9f3
[Flax] Adapt flax examples to include `push_to_hub` (#12391)
patrickvonplaten Jun 28, 2021
8403d8a
Tensorflow LM examples (#12358)
Rocketknight1 Jun 28, 2021
789fa55
pass the matching trainer log level to deepspeed (#12401)
stas00 Jun 28, 2021
40756f1
[Flax] Add T5 pretraining script (#12355)
patrickvonplaten Jun 28, 2021
629a6c4
[models] respect dtype of the model when instantiating it (#12316)
stas00 Jun 29, 2021
54d1520
Rename detr targets to labels (#12280)
NielsRogge Jun 29, 2021
e95adf9
Add out of vocabulary error to ASR models (#12288)
will-rice Jun 29, 2021
7d5fa57
Fix TFWav2Vec2 SpecAugment (#12289)
will-rice Jun 29, 2021
1280789
[example/flax] add summarization readme (#12393)
patil-suraj Jun 29, 2021
e9292fd
[Flax] Example scripts - correct weight decay (#12409)
patrickvonplaten Jun 29, 2021
f6480cc
fix ids_to_tokens naming error in tokenizer of deberta v2 (#12412)
hjptriplebee Jun 29, 2021
6a9edeb
minor fixes in original RAG training (#12395)
Jun 29, 2021
e7961b6
Added talks (#12415)
suzana-ilic Jun 29, 2021
096f9d3
Easily train a new fast tokenizer from a given one (#12361)
sgugger Jun 29, 2021
b54d8fc
[modelcard] fix (#12422)
stas00 Jun 29, 2021
4f2d819
Add option to save on each training node (#12421)
sgugger Jun 30, 2021
262c6a7
Added to talks section (#12433)
suzana-ilic Jun 30, 2021
ef5b12b
Fix default bool in argparser (#12424)
sgugger Jun 30, 2021
80b394e
Add default bos_token and eos_token for tokenizer of deberta_v2 (#12429)
hjptriplebee Jun 30, 2021
c319a74
Add CANINE (#12024)
NielsRogge Jun 30, 2021
915af4f
Document patch release v4.8.2
LysandreJik Jun 30, 2021
2737bc5
fix typo in mt5 configuration docstring (#12432)
fcakyon Jun 30, 2021
7ae294a
Add to talks section (#12442)
suzana-ilic Jun 30, 2021
3a7d17f
[JAX/Flax readme] add philosophy doc (#12419)
patil-suraj Jun 30, 2021
06e59d2
[Flax] Add wav2vec2 (#12271)
patrickvonplaten Jun 30, 2021
6c94a48
Merge branch 'master' into rembert
Iwontbecreative Jul 15, 2021
259d1de
Add missing Copied from statements
Iwontbecreative Jul 15, 2021
88c7929
Reference model uploaded under Google org
Iwontbecreative Jul 15, 2021
892482b
Fix various duplicates from merging
Iwontbecreative Jul 15, 2021
62b9f7a
Rembert-large -> rembert, fix overeager Copied from, return type
Iwontbecreative Jul 16, 2021
8ec4407
Incorporate PR comments from Patrick and Sylvain
Iwontbecreative Jul 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,8 @@ Flax), PyTorch, and/or TensorFlow.
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Reformer | ✅ | ✅ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| RemBert | ✅ | ✅ | ✅ | ✅ | ❌ |
Iwontbecreative marked this conversation as resolved.
Show resolved Hide resolved
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| RetriBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ |
Expand Down Expand Up @@ -554,6 +556,7 @@ Flax), PyTorch, and/or TensorFlow.
model_doc/prophetnet
model_doc/rag
model_doc/reformer
model_doc/rembert
model_doc/retribert
model_doc/roberta
model_doc/roformer
Expand Down
159 changes: 159 additions & 0 deletions docs/source/model_doc/rembert.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

RemBERT
-----------------------------------------------------------------------------------------------------------------------

Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The RemBERT model was proposed in `Rethinking Embedding Coupling in Pre-trained Language Models
<https://arxiv.org/abs/2010.12821>`__ by Hyung Won Chung, Thibault Févry, Henry Tsai, Melvin Johnson, Sebastian Ruder.

The abstract from the paper is the following:

*We re-evaluate the standard practice of sharing weights between input and output embeddings in state-of-the-art
pre-trained language models. We show that decoupled embeddings provide increased modeling flexibility, allowing us to
significantly improve the efficiency of parameter allocation in the input embedding of multilingual models. By
reallocating the input embedding parameters in the Transformer layers, we achieve dramatically better performance on
standard natural language understanding tasks with the same number of parameters during fine-tuning. We also show that
allocating additional capacity to the output embedding provides benefits to the model that persist through the
fine-tuning stage even though the output embedding is discarded after pre-training. Our analysis shows that larger
output embeddings prevent the model's last layers from overspecializing to the pre-training task and encourage
Transformer representations to be more general and more transferable to other tasks and languages. Harnessing these
findings, we are able to train models that achieve strong performance on the XTREME benchmark without increasing the
number of parameters at the fine-tuning stage.*

Tips:

For Fine-tuning, RemBERT can be thought of as a bigger version of mBERT with an ALBERT-like factorization of the
Iwontbecreative marked this conversation as resolved.
Show resolved Hide resolved
Iwontbecreative marked this conversation as resolved.
Show resolved Hide resolved
embedding layer.

RemBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.RemBertConfig
:members:


RemBertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.RemBertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary


RemBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.RemBertTokenizerFast
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary


RemBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.RemBertModel
:members: forward


RemBertForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.RemBertForCausalLM
:members: forward


RemBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.RemBertForMaskedLM
:members: forward


RemBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.RemBertForSequenceClassification
:members: forward


RemBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.RemBertForMultipleChoice
:members: forward


RemBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.RemBertForTokenClassification
:members: forward


RemBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.RemBertForQuestionAnswering
:members: forward


TFRemBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFRemBertModel
:members: call


TFRemBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFRemBertForMaskedLM
:members: call


TFRemBertForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFRemBertForCausalLM
:members: call


TFRemBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFRemBertForSequenceClassification
:members: call


TFRemBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFRemBertForMultipleChoice
:members: call


TFRemBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFRemBertForTokenClassification
:members: call


TFRemBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFRemBertForQuestionAnswering
:members: call
61 changes: 61 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@
"models.prophetnet": ["PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP", "ProphetNetConfig", "ProphetNetTokenizer"],
"models.rag": ["RagConfig", "RagRetriever", "RagTokenizer"],
"models.reformer": ["REFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP", "ReformerConfig"],
"models.rembert": ["REMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "RemBertConfig"],
"models.retribert": ["RETRIBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "RetriBertConfig", "RetriBertTokenizer"],
"models.roberta": ["ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP", "RobertaConfig", "RobertaTokenizer"],
"models.roformer": ["ROFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP", "RoFormerConfig", "RoFormerTokenizer"],
Expand Down Expand Up @@ -316,6 +317,7 @@
_import_structure["models.mt5"].append("MT5Tokenizer")
_import_structure["models.pegasus"].append("PegasusTokenizer")
_import_structure["models.reformer"].append("ReformerTokenizer")
_import_structure["models.rembert"].append("RemBertTokenizer")
_import_structure["models.speech_to_text"].append("Speech2TextTokenizer")
_import_structure["models.t5"].append("T5Tokenizer")
_import_structure["models.xlm_prophetnet"].append("XLMProphetNetTokenizer")
Expand Down Expand Up @@ -361,6 +363,7 @@
_import_structure["models.openai"].append("OpenAIGPTTokenizerFast")
_import_structure["models.pegasus"].append("PegasusTokenizerFast")
_import_structure["models.reformer"].append("ReformerTokenizerFast")
_import_structure["models.rembert"].append("RemBertTokenizerFast")
_import_structure["models.retribert"].append("RetriBertTokenizerFast")
_import_structure["models.roberta"].append("RobertaTokenizerFast")
_import_structure["models.squeezebert"].append("SqueezeBertTokenizerFast")
Expand Down Expand Up @@ -506,6 +509,7 @@
"load_tf_weights_in_albert",
]
)

_import_structure["models.auto"].extend(
[
"MODEL_FOR_CAUSAL_LM_MAPPING",
Expand Down Expand Up @@ -977,6 +981,21 @@
"ReformerPreTrainedModel",
]
)
_import_structure["models.rembert"].extend(
[
"REMBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
"RemBertForCausalLM",
"RemBertForMaskedLM",
"RemBertForMultipleChoice",
"RemBertForQuestionAnswering",
"RemBertForSequenceClassification",
"RemBertForTokenClassification",
"RemBertLayer",
"RemBertModel",
"RemBertPreTrainedModel",
"load_tf_weights_in_rembert",
]
)
_import_structure["models.retribert"].extend(
["RETRIBERT_PRETRAINED_MODEL_ARCHIVE_LIST", "RetriBertModel", "RetriBertPreTrainedModel"]
)
Expand Down Expand Up @@ -1433,6 +1452,20 @@
"TFRagTokenForGeneration",
]
)
_import_structure["models.rembert"].extend(
[
"TF_REMBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
"TFRemBertForCausalLM",
"TFRemBertForMaskedLM",
"TFRemBertForMultipleChoice",
"TFRemBertForQuestionAnswering",
"TFRemBertForSequenceClassification",
"TFRemBertForTokenClassification",
"TFRemBertLayer",
"TFRemBertModel",
"TFRemBertPreTrainedModel",
]
)
_import_structure["models.roberta"].extend(
[
"TF_ROBERTA_PRETRAINED_MODEL_ARCHIVE_LIST",
Expand Down Expand Up @@ -1825,6 +1858,7 @@
from .models.prophetnet import PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP, ProphetNetConfig, ProphetNetTokenizer
from .models.rag import RagConfig, RagRetriever, RagTokenizer
from .models.reformer import REFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, ReformerConfig
from .models.rembert import REMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, RemBertConfig
from .models.retribert import RETRIBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, RetriBertConfig, RetriBertTokenizer
from .models.roberta import ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP, RobertaConfig, RobertaTokenizer
from .models.roformer import ROFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, RoFormerConfig, RoFormerTokenizer
Expand Down Expand Up @@ -1918,6 +1952,7 @@
from .models.mt5 import MT5Tokenizer
from .models.pegasus import PegasusTokenizer
from .models.reformer import ReformerTokenizer
from .models.rembert import RemBertTokenizer
from .models.speech_to_text import Speech2TextTokenizer
from .models.t5 import T5Tokenizer
from .models.xlm_prophetnet import XLMProphetNetTokenizer
Expand Down Expand Up @@ -1953,6 +1988,7 @@
from .models.openai import OpenAIGPTTokenizerFast
from .models.pegasus import PegasusTokenizerFast
from .models.reformer import ReformerTokenizerFast
from .models.rembert import RemBertTokenizerFast
from .models.retribert import RetriBertTokenizerFast
from .models.roberta import RobertaTokenizerFast
from .models.roformer import RoFormerTokenizerFast
Expand Down Expand Up @@ -2461,6 +2497,19 @@
ReformerModelWithLMHead,
ReformerPreTrainedModel,
)
from .models.rembert import (
REMBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
RemBertForCausalLM,
RemBertForMaskedLM,
RemBertForMultipleChoice,
RemBertForQuestionAnswering,
RemBertForSequenceClassification,
RemBertForTokenClassification,
RemBertLayer,
RemBertModel,
RemBertPreTrainedModel,
load_tf_weights_in_rembert,
)
from .models.retribert import RETRIBERT_PRETRAINED_MODEL_ARCHIVE_LIST, RetriBertModel, RetriBertPreTrainedModel
from .models.roberta import (
ROBERTA_PRETRAINED_MODEL_ARCHIVE_LIST,
Expand Down Expand Up @@ -2843,6 +2892,18 @@
)
from .models.pegasus import TFPegasusForConditionalGeneration, TFPegasusModel, TFPegasusPreTrainedModel
from .models.rag import TFRagModel, TFRagPreTrainedModel, TFRagSequenceForGeneration, TFRagTokenForGeneration
from .models.rembert import (
TF_REMBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
TFRemBertForCausalLM,
TFRemBertForMaskedLM,
TFRemBertForMultipleChoice,
TFRemBertForQuestionAnswering,
TFRemBertForSequenceClassification,
TFRemBertForTokenClassification,
TFRemBertLayer,
TFRemBertModel,
TFRemBertPreTrainedModel,
)
from .models.roberta import (
TF_ROBERTA_PRETRAINED_MODEL_ARCHIVE_LIST,
TFRobertaForMaskedLM,
Expand Down
6 changes: 6 additions & 0 deletions src/transformers/commands/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,12 @@ def run(self):
)

convert_lxmert_checkpoint_to_pytorch(self._tf_checkpoint, self._pytorch_dump_output)
elif self._model_type == "rembert":
from ..models.rembert.convert_rembert_tf_checkpoint_to_pytorch import (
convert_rembert_tf_checkpoint_to_pytorch,
)

convert_rembert_tf_checkpoint_to_pytorch(self._tf_checkpoint, self._config, self._pytorch_dump_output)
else:
raise ValueError(
"--model_type should be selected in the list [bert, gpt, gpt2, t5, transfo_xl, xlnet, xlm, lxmert]"
Expand Down
30 changes: 30 additions & 0 deletions src/transformers/convert_slow_tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -668,6 +668,35 @@ class ReformerConverter(SpmConverter):
pass


class RemBertConverter(SpmConverter):
Iwontbecreative marked this conversation as resolved.
Show resolved Hide resolved
# Inspired from AlbertConverter
def normalizer(self, proto):
list_normalizers = [
normalizers.Replace("``", '"'),
normalizers.Replace("''", '"'),
normalizers.Replace(Regex(" {2,}"), " "),
]
if not self.original_tokenizer.keep_accents:
list_normalizers.append(normalizers.NFKD())
list_normalizers.append(normalizers.StripAccents())
if self.original_tokenizer.do_lower_case:
list_normalizers.append(normalizers.Lowercase())

precompiled_charsmap = proto.normalizer_spec.precompiled_charsmap
list_normalizers.append(normalizers.Precompiled(precompiled_charsmap))
return normalizers.Sequence(list_normalizers)

def post_processor(self):
return processors.TemplateProcessing(
single="[CLS]:0 $A:0 [SEP]:0",
pair="[CLS]:0 $A:0 [SEP]:0 $B:1 [SEP]:1",
special_tokens=[
("[CLS]", self.original_tokenizer.convert_tokens_to_ids("[CLS]")),
("[SEP]", self.original_tokenizer.convert_tokens_to_ids("[SEP]")),
],
)


class BertGenerationConverter(SpmConverter):
pass

Expand Down Expand Up @@ -792,6 +821,7 @@ def converted(self) -> Tokenizer:
"OpenAIGPTTokenizer": OpenAIGPTConverter,
"PegasusTokenizer": PegasusConverter,
"ReformerTokenizer": ReformerConverter,
"RemBertTokenizer": RemBertConverter,
"RetriBertTokenizer": BertConverter,
"RobertaTokenizer": RobertaConverter,
"RoFormerTokenizer": RoFormerConverter,
Expand Down
8 changes: 7 additions & 1 deletion src/transformers/modeling_tf_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -725,7 +725,13 @@ def get_output_embeddings(self) -> Union[None, tf.keras.layers.Layer]:
if self.get_lm_head() is not None:
lm_head = self.get_lm_head()

return lm_head.get_output_embeddings()
try:
return lm_head.get_output_embeddings()
except AttributeError:
logger.info("Building the model")
self(self.dummy_inputs)

return lm_head().get_output_embeddings()
Iwontbecreative marked this conversation as resolved.
Show resolved Hide resolved

return None # Overwrite for models with output embeddings

Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
prophetnet,
rag,
reformer,
rembert,
retribert,
roberta,
roformer,
Expand Down
Loading