Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

t5 with coref model #4088

Closed
ianupright opened this issue Apr 16, 2020 · 8 comments
Closed

t5 with coref model #4088

ianupright opened this issue Apr 16, 2020 · 8 comments

Comments

@ianupright
Copy link

ianupright commented Apr 16, 2020

when I try to use the t5-large model with the coref model, I get this:

Traceback (most recent call last):
  File "/usr/local/hd1/projects/allennlp/allennlp/__main__.py", line 23, in <module>
    run()
  File "/usr/local/hd1/projects/allennlp/allennlp/__main__.py", line 19, in run
    main(prog="allennlp")
  File "/usr/local/hd1/projects/allennlp/allennlp/commands/__init__.py", line 93, in main
    args.func(args)
  File "/usr/local/hd1/projects/allennlp/allennlp/commands/train.py", line 143, in train_model_from_args
    dry_run=args.dry_run,
  File "/usr/local/hd1/projects/allennlp/allennlp/commands/train.py", line 202, in train_model_from_file
    dry_run=dry_run,
  File "/usr/local/hd1/projects/allennlp/allennlp/commands/train.py", line 265, in train_model
    dry_run=dry_run,
  File "/usr/local/hd1/projects/allennlp/allennlp/commands/train.py", line 462, in _train_worker
    metrics = train_loop.run()
  File "/usr/local/hd1/projects/allennlp/allennlp/commands/train.py", line 524, in run
    return self.trainer.train()
  File "/usr/local/hd1/projects/allennlp/allennlp/training/trainer.py", line 732, in train
    train_metrics = self._train_epoch(epoch)
  File "/usr/local/hd1/projects/allennlp/allennlp/training/trainer.py", line 500, in _train_epoch
    batch_outputs = self.batch_outputs(batch, for_training=True)
  File "/usr/local/hd1/projects/allennlp/allennlp/training/trainer.py", line 406, in batch_outputs
    output_dict = self._pytorch_model(**batch)
  File "/usr/local/hd1/projects/allennlp/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/hd1/projects/allennlp-models/allennlp_models/coref/coref_model.py", line 180, in forward
    text_embeddings = self._lexical_dropout(self._text_field_embedder(text))
  File "/usr/local/hd1/projects/allennlp/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/hd1/projects/allennlp/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 89, in forward
    token_vectors = embedder(**tensors, **forward_params_values)
  File "/usr/local/hd1/projects/allennlp/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/hd1/projects/allennlp/allennlp/modules/token_embedders/pretrained_transformer_mismatched_embedder.py", line 75, in forward
    token_ids, wordpiece_mask, type_ids=type_ids, segment_concat_mask=segment_concat_mask
  File "/usr/local/hd1/projects/allennlp/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/hd1/projects/allennlp/allennlp/modules/token_embedders/pretrained_transformer_embedder.py", line 98, in forward
    max_type_id = type_ids.max()
RuntimeError: invalid argument 1: cannot perform reduction function max on tensor with no elements because the operation does not have an identity at /pytorch/aten/src/THC/generic/THCTensorMathReduce.cu:85

would there be something I need to do with configuring the tokenizer, etc. to make this work?

Thanks

@ZhaofengWu
Copy link
Contributor

@dirkgr sounds like some incompatibility among different types of transformers craziness again?

@dirkgr
Copy link
Member

dirkgr commented Apr 17, 2020

@ianupright, is it possible you send an empty sequence to the model?

@dirkgr
Copy link
Member

dirkgr commented Apr 24, 2020

Closing due to lack of activity

@dirkgr dirkgr closed this as completed Apr 24, 2020
@NicolasAG
Copy link

NicolasAG commented Nov 10, 2020

I have the same issue when using t5-base encoder as a pretrained_token_embedder.
I first had to deal with adding additional special tokens and ended up using the latest proposed solution here: #4690 (comment)
Now that this is done I get the same issue as stated above right at the beginning of training.
@ianupright did you find what was going on?
I still need to investigate a bit more and make sure that I don't have an empty sequence as input as @dirkgr suggested...
[EDIT]: right, so the input tensor is empty. I printed the content of my input TextField:

0 | 2020-11-10 18:21:24,514 - INFO - models.allen_s2s - source tokens[pretrained_transformer] =
0 | 2020-11-10 18:21:24,516 - INFO - models.allen_s2s -  \_token_ids = tensor([], device='cuda:0', size=(64, 0), dtype=torch.int64)
0 | 2020-11-10 18:21:24,521 - INFO - models.allen_s2s -  \_mask = tensor([[ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        ...,
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False]], device='cuda:0')
0 | 2020-11-10 18:21:24,522 - INFO - models.allen_s2s -  \_type_ids = tensor([], device='cuda:0', size=(64, 0), dtype=torch.int64)
0 | 2020-11-10 18:21:24,523 - INFO - models.allen_s2s -  \_segment_concat_mask = tensor([], device='cuda:0', size=(64, 0), dtype=torch.int64)

I bet this is related and due to this other issue (#4649 )

@dirkgr
Copy link
Member

dirkgr commented Nov 11, 2020

That's a fair bet. I finished the fix for #4649 (#4732). When that is merged, do you mind trying this with the new code?

@dirkgr dirkgr reopened this Nov 11, 2020
@dirkgr
Copy link
Member

dirkgr commented Nov 11, 2020

#4732 is merged now!

@NicolasAG
Copy link

After adding the changes in #4732 the input sequence is not empty anymore :)

0 | 2020-11-11 15:22:38,383 - INFO - models.allen_s2s - source tokens[pretrained_transformer] =
0 | 2020-11-11 15:22:38,387 - INFO - models.allen_s2s -  \_token_ids = tensor([[32102, 21903,     9,  ...,     0,     0,     0],
        [32102,  7780,    19,  ...,     0,     0,     0],
        [32102, 12446,    19,  ...,     0,     0,     0],
        ...,
        [32102, 27121,    65,  ...,     0,     0,     0],
        [32102, 25630,    19,  ...,     0,     0,     0],
        [32102,  1290, 11113,  ...,     0,     0,     0]], device='cuda:0')
0 | 2020-11-11 15:22:38,390 - INFO - models.allen_s2s -  \_mask = tensor([[ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        ...,
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False]], device='cuda:0')
0 | 2020-11-11 15:22:38,391 - INFO - models.allen_s2s -
0 | 2020-11-11 15:22:38,393 - INFO - models.allen_s2s -  \_type_ids = tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], device='cuda:0')
0 | 2020-11-11 15:22:38,396 - INFO - models.allen_s2s -  \_segment_concat_mask = tensor([[ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        ...,
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False],
        [ True,  True,  True,  ..., False, False, False]], device='cuda:0')

I have another issue after that:
indexSelectLargeIndex: block: [84,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed but that's probably not related. It's probably because of the way I added special tokens in the vocab... which is related to #4690

@dirkgr
Copy link
Member

dirkgr commented Nov 12, 2020

I am almost certain that the special tokens are the problem you are seeing now. I'll close this issue, since we're tracking the special tokens issue in #4690.

@dirkgr dirkgr closed this as completed Nov 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants