t5 with coref model #4088

ianupright · 2020-04-16T23:50:45Z

when I try to use the t5-large model with the coref model, I get this:

Traceback (most recent call last):
  File "/usr/local/hd1/projects/allennlp/allennlp/__main__.py", line 23, in <module>
    run()
  File "/usr/local/hd1/projects/allennlp/allennlp/__main__.py", line 19, in run
    main(prog="allennlp")
  File "/usr/local/hd1/projects/allennlp/allennlp/commands/__init__.py", line 93, in main
    args.func(args)
  File "/usr/local/hd1/projects/allennlp/allennlp/commands/train.py", line 143, in train_model_from_args
    dry_run=args.dry_run,
  File "/usr/local/hd1/projects/allennlp/allennlp/commands/train.py", line 202, in train_model_from_file
    dry_run=dry_run,
  File "/usr/local/hd1/projects/allennlp/allennlp/commands/train.py", line 265, in train_model
    dry_run=dry_run,
  File "/usr/local/hd1/projects/allennlp/allennlp/commands/train.py", line 462, in _train_worker
    metrics = train_loop.run()
  File "/usr/local/hd1/projects/allennlp/allennlp/commands/train.py", line 524, in run
    return self.trainer.train()
  File "/usr/local/hd1/projects/allennlp/allennlp/training/trainer.py", line 732, in train
    train_metrics = self._train_epoch(epoch)
  File "/usr/local/hd1/projects/allennlp/allennlp/training/trainer.py", line 500, in _train_epoch
    batch_outputs = self.batch_outputs(batch, for_training=True)
  File "/usr/local/hd1/projects/allennlp/allennlp/training/trainer.py", line 406, in batch_outputs
    output_dict = self._pytorch_model(**batch)
  File "/usr/local/hd1/projects/allennlp/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/hd1/projects/allennlp-models/allennlp_models/coref/coref_model.py", line 180, in forward
    text_embeddings = self._lexical_dropout(self._text_field_embedder(text))
  File "/usr/local/hd1/projects/allennlp/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/hd1/projects/allennlp/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 89, in forward
    token_vectors = embedder(**tensors, **forward_params_values)
  File "/usr/local/hd1/projects/allennlp/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/hd1/projects/allennlp/allennlp/modules/token_embedders/pretrained_transformer_mismatched_embedder.py", line 75, in forward
    token_ids, wordpiece_mask, type_ids=type_ids, segment_concat_mask=segment_concat_mask
  File "/usr/local/hd1/projects/allennlp/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/hd1/projects/allennlp/allennlp/modules/token_embedders/pretrained_transformer_embedder.py", line 98, in forward
    max_type_id = type_ids.max()
RuntimeError: invalid argument 1: cannot perform reduction function max on tensor with no elements because the operation does not have an identity at /pytorch/aten/src/THC/generic/THCTensorMathReduce.cu:85

would there be something I need to do with configuring the tokenizer, etc. to make this work?

Thanks

The text was updated successfully, but these errors were encountered:

ZhaofengWu · 2020-04-17T05:00:57Z

@dirkgr sounds like some incompatibility among different types of transformers craziness again?

dirkgr · 2020-04-17T21:34:07Z

@ianupright, is it possible you send an empty sequence to the model?

dirkgr · 2020-04-24T22:17:08Z

Closing due to lack of activity

NicolasAG · 2020-11-10T17:59:25Z

I have the same issue when using t5-base encoder as a pretrained_token_embedder.
I first had to deal with adding additional special tokens and ended up using the latest proposed solution here: #4690 (comment)
Now that this is done I get the same issue as stated above right at the beginning of training.
@ianupright did you find what was going on?
~~I still need to investigate a bit more and make sure that I don't have an empty sequence as input as @dirkgr suggested...~~
[EDIT]: right, so the input tensor is empty. I printed the content of my input TextField:

0 | 2020-11-10 18:21:24,514 - INFO - models.allen_s2s - source tokens[pretrained_transformer] =
0 | 2020-11-10 18:21:24,516 - INFO - models.allen_s2s -  \_token_ids = tensor([], device='cuda:0', size=(64, 0), dtype=torch.int64)
0 | 2020-11-10 18:21:24,521 - INFO - models.allen_s2s -  \_mask = tensor([[ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        ...,
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False]], device='cuda:0')
0 | 2020-11-10 18:21:24,522 - INFO - models.allen_s2s -  \_type_ids = tensor([], device='cuda:0', size=(64, 0), dtype=torch.int64)
0 | 2020-11-10 18:21:24,523 - INFO - models.allen_s2s -  \_segment_concat_mask = tensor([], device='cuda:0', size=(64, 0), dtype=torch.int64)

I bet this is related and due to this other issue (#4649 )

dirkgr · 2020-11-11T00:34:17Z

That's a fair bet. I finished the fix for #4649 (#4732). When that is merged, do you mind trying this with the new code?

dirkgr · 2020-11-11T00:44:26Z

#4732 is merged now!

NicolasAG · 2020-11-11T15:31:42Z

After adding the changes in #4732 the input sequence is not empty anymore :)

0 | 2020-11-11 15:22:38,383 - INFO - models.allen_s2s - source tokens[pretrained_transformer] =
0 | 2020-11-11 15:22:38,387 - INFO - models.allen_s2s -  \_token_ids = tensor([[32102, 21903,     9,  ...,     0,     0,     0],
        [32102,  7780,    19,  ...,     0,     0,     0],
        [32102, 12446,    19,  ...,     0,     0,     0],
        ...,
        [32102, 27121,    65,  ...,     0,     0,     0],
        [32102, 25630,    19,  ...,     0,     0,     0],
        [32102,  1290, 11113,  ...,     0,     0,     0]], device='cuda:0')
0 | 2020-11-11 15:22:38,390 - INFO - models.allen_s2s -  \_mask = tensor([[ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        ...,
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False]], device='cuda:0')
0 | 2020-11-11 15:22:38,391 - INFO - models.allen_s2s -
0 | 2020-11-11 15:22:38,393 - INFO - models.allen_s2s -  \_type_ids = tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], device='cuda:0')
0 | 2020-11-11 15:22:38,396 - INFO - models.allen_s2s -  \_segment_concat_mask = tensor([[ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        ...,
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False]], device='cuda:0')

I have another issue after that:
indexSelectLargeIndex: block: [84,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed but that's probably not related. It's probably because of the way I added special tokens in the vocab... which is related to #4690

dirkgr · 2020-11-12T00:28:17Z

I am almost certain that the special tokens are the problem you are seeing now. I'll close this issue, since we're tracking the special tokens issue in #4690.

dirkgr closed this as completed Apr 24, 2020

dirkgr reopened this Nov 11, 2020

dirkgr closed this as completed Nov 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t5 with coref model #4088

t5 with coref model #4088

ianupright commented Apr 16, 2020 •

edited by dirkgr

Loading

ZhaofengWu commented Apr 17, 2020

dirkgr commented Apr 17, 2020

dirkgr commented Apr 24, 2020

NicolasAG commented Nov 10, 2020 •

edited

Loading

dirkgr commented Nov 11, 2020

dirkgr commented Nov 11, 2020

NicolasAG commented Nov 11, 2020

dirkgr commented Nov 12, 2020

t5 with coref model #4088

t5 with coref model #4088

Comments

ianupright commented Apr 16, 2020 • edited by dirkgr Loading

ZhaofengWu commented Apr 17, 2020

dirkgr commented Apr 17, 2020

dirkgr commented Apr 24, 2020

NicolasAG commented Nov 10, 2020 • edited Loading

dirkgr commented Nov 11, 2020

dirkgr commented Nov 11, 2020

NicolasAG commented Nov 11, 2020

dirkgr commented Nov 12, 2020

ianupright commented Apr 16, 2020 •

edited by dirkgr

Loading

NicolasAG commented Nov 10, 2020 •

edited

Loading