something wrong with "answer_vectors = default_collate(_answer_vectors)" in ansemb/dataset/data_utils.py #3

hackerchenzhuo · 2020-08-23T13:12:31Z

train E000: 0% 0/3467 [00:02<?, ?it/s]
Traceback (most recent call last):
File "train_vqa_embedding.py", line 266, in
main(args)
File "train_vqa_embedding.py", line 239, in main
train(context_net, answer_net, train_loader, optimizer, tracker, args, prefix='train', epoch=i)
File "train_vqa_embedding.py", line 108, in train
for v, q, avocab, a, labels, idx, q_len in tq:
File "/home/anaconda3/lib/python3.6/site-packages/tqdm/std.py", line 1130, in iter
for obj in iterable:
File "/home/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/home/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/answer_embedding-master/ansemb/dataset/data_utils.py", line 95, in collate_fn
answer_vectors = default_collate(_answer_vectors)
File "/home/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Tensor

hackerchenzhuo · 2020-08-23T13:15:21Z

hi :)
it seems that this error is raised by the update of torch.stack(), how could I fix it to make it run?

hexiang-hu · 2020-08-25T04:27:43Z

Hi,

It seems that the variable ``batch'' here is not a tuple (or a list) of tensors.

I think the easiest way to debug this is to set the number of workers to be 0 and then trace the line#95 of collate_fn in data_uils.py to see what kind of data structure this ``_anwer_vectors'' is.

Best,

hackerchenzhuo · 2020-08-25T04:49:43Z

Thank you so much.
After seting the number of workers to be 0, some errors really disappear but some remain the same.
like:

_train E000: 0% 0/3467 [00:02<?, ?it/s]
Traceback (most recent call last):
File "train_vqa_embedding.py", line 266, in
main(args)
File "train_vqa_embedding.py", line 239, in main
train(context_net, answer_net, train_loader, optimizer, tracker, args, prefix='train', epoch=i)
File "train_vqa_embedding.py", line 108, in train
for v, q, avocab, a, labels, idx, q_len in tq:
File "/home/chenzhuo/anaconda3/lib/python3.6/site-packages/tqdm/std.py", line 1130, in iter
for obj in iterable:
File "/home/chenzhuo/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/home/chenzhuo/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/chenzhuo/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/chenzhuo/answer_embedding-master/ansemb/dataset/data_utils.py", line 95, in collate_fn
answer_vectors = default_collate(_answer_vectors)
File "/home/chenzhuo/anaconda3/lib/python3.6/site-packages/torch/utils/data/utils/collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Tensor

And I use
print(" data structure of _answer_vectors:", type(_answer_vectors))
to see the data structure of this ``_anwer_vectors'' before
answer_vectors = default_collate(_answer_vectors)

it print that : data structure of _answer_vectors: <class 'torch.Tensor'>

hackerchenzhuo · 2020-08-25T05:01:53Z

this is the printed text before:

@xxx:~/answer_embedding-master$ python train_vqa_embedding.py --gpu_id 1
{'gpu_id': 1, 'finetune': False, 'batch_size': 128, 'max_negative_answer': 12000, 'answer_batch_size': 3000, 'loss_temperature': 0.01, 'pretrained_model': None, 'context_embedding': 'SAN', 'answer_embedding': 'BoW', 'name': None}
{'cache_path': '/home/answer_embedding-master/.cache', 'output_path': '/home/answer_embedding-master/outputs', 'embedding_size': 1024, 'seed': 1618, 'question_vocab_path': '/home/answer_embedding-master/data/question.vocab.json', 'image_size': 448, 'output_size': 14, 'preprocess_batch_size': 100, 'output_features': 2048, 'central_fraction': 0.875, 'TRAIN': {'epochs': 50, 'batch_size': 128, 'base_lr': 0.001, 'lr_decay': 15, 'data_workers': 0, 'answer_batch_size': 3000, 'max_negative_answer': 8000}, 'TEST': {'max_answer_index': 3000}, 'VQA2': {'qa_path': '/home/answer_embedding-master/data/vqa2', 'feature_path': '/home/answer_embedding-master/features/vqa-resnet-14x14.h5', 'answer_vocab_path': '/home/answer_embedding-master/data/answer.vocab.vqa.json', 'train_img_path': '/home/answer_embedding-master/data/vqa2/images/train2014', 'val_img_path': '/home/answer_embedding-master/data/vqa2/images/val2014', 'test_img_path': '/home/answer_embedding-master/data/vqa2/images/test-dev2015', 'train_qa': 'train2014', 'val_qa': 'val2014', 'test_qa': 'test-dev2015', 'task': 'OpenEnded', 'dataset': 'mscoco'}, 'VG': {'qa_path': '/home/answer_embedding-master/data/vg', 'feature_path': '/home/answer_embedding-master/features/vg-resnet-14x14.h5', 'answer_vocab_path': '/home/answer_embedding-master/data/answer.vocab.vg.json', 'train_qa': 'VG_train_decoys.json', 'val_qa': 'VG_val_decoys.json', 'test_qa': 'VG_test_decoys.json', 'img_path': '/home/answer_embedding-master/data/vg/images'}, 'Visual7W': {'qa_path': '/home/answer_embedding-master/data/v7w', 'feature_path': '/home/answer_embedding-master/features/vg-resnet-14x14.h5', 'answer_vocab_path': '/home/answer_embedding-master/data/answer.vocab.v7w.json', 'train_qa': 'v7w_train_questions.json', 'val_qa': 'v7w_val_questions.json', 'test_qa': 'v7w_test_questions.json', 'train_v7w_decoys': 'v7w_train_decoys.json', 'val_v7w_decoys': 'v7w_val_decoys.json', 'test_v7w_decoys': 'v7w_test_decoys.json', 'img_path': '/home/answer_embedding-master/data/v7w/images'}}
Output data would be saved to /home/answer_embedding-master/outputs/SAN_BoW_vqa_batch_softmax_embedding_2020-08-25_12:45:17.pth

Loading vectors to .vector_cache/glove.840B.300d.txt.pt
import answer vocabulary from: /home/answer_embedding-master/data/answer.vocab.vqa.json
extracting answers...
loading cache from: /home/answer_embedding-master/.cache/v2_OpenEnded_mscoco_train2014_questions.json.v2_mscoco_train2014_annotations.json.pt
import answer vocabulary from: /home/answer_embedding-master/data/answer.vocab.vqa.json
extracting answers...
loading cache from: /home/answer_embedding-master/.cache/v2_OpenEnded_mscoco_val2014_questions.json.v2_mscoco_val2014_annotations.json.pt
/home/answer_embedding-master/ansemb/models/layers.py:101: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
init.xavier_uniform(w)
/home/answer_embedding-master/ansemb/models/embedding.py:48: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
init.xavier_uniform(m.weight)
Context Model:
StackedAttentionEmbedding(
(embedding): Embedding(15419, 300, padding_idx=0)
(drop): Dropout(p=0.5, inplace=False)
(text): Seq2SeqRNN(
(rnn): LSTM(300, 512, batch_first=True, bidirectional=True)
)
(attention): Attention(
(v_conv): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(q_lin): Linear(in_features=1024, out_features=512, bias=True)
(x_conv): Conv2d(512, 2, kernel_size=(1, 1), stride=(1, 1))
(drop): Dropout(p=0.5, inplace=False)
(relu): LeakyReLU(negative_slope=0.01, inplace=True)
)
(mlp): GroupMLP(
(conv1): Conv1d(5120, 4096, kernel_size=(1,), stride=(1,))
(drop): Dropout(p=0.5, inplace=False)
(relu): LeakyReLU(negative_slope=0.01)
(conv2): Conv1d(4096, 1024, kernel_size=(1,), stride=(1,), groups=64)
)
)
Answer Model:
MLPEmbedding(
(mlp): GroupMLP(
(conv1): Conv1d(300, 4096, kernel_size=(1,), stride=(1,))
(drop): Dropout(p=0.5, inplace=False)
(relu): LeakyReLU(negative_slope=0.01)
(conv2): Conv1d(4096, 1024, kernel_size=(1,), stride=(1,), groups=64)
)
)
train E000: 0% 0/3467 [00:00<?, ?it/s] data structure of _answer_vectors: <class 'torch.Tensor'>
train E000: 0% 0/3467 [00:00<?, ?it/s]
Traceback (most recent call last):
....error text ...

And the way I used the image future file refer to vqa-resnet-14x14.h5 from How to generate "vqa-resnet-14x14.h5"?

I modied the preprocess-images.py from pytorch-vqa repo, I dont know whether the error is raised by this?

hexiang-hu · 2020-08-25T06:13:35Z

Hi,

I think this error is very likely due to the change of pytorch version. Unfortunately that I do not have the machine & data to re-run this experiment at this moment.

Can you see if it would work for you to change the L#95 ``answer_vectors = default_collate(_answer_vectors)'' to

answer_vectors = default_collate((_answer_vectors,))

If not, can you check the shape of ``_answer_vectors''?

hackerchenzhuo · 2020-08-25T06:28:24Z

data structure of _answer_vectors: <class 'torch.Tensor'>
shape of _answer_vectors: torch.Size([128, 3000])

I change the L#95 ``answer_vectors = default_collate(_answer_vectors)'' to
answer_vectors = default_collate((_answer_vectors,))
which raised new error:(But the original error is gone)

Traceback (most recent call last):
  File "train_vqa_embedding.py", line 266, in <module>
    main(args)
  File "train_vqa_embedding.py", line 239, in main
    train(context_net, answer_net, train_loader, optimizer, tracker, args, prefix='train', epoch=i)
  File "train_vqa_embedding.py", line 115, in train
    answer_var, answer_len = loader.dataset._get_answer_vectors(avocab)
  File "/home/answer_embedding-master/ansemb/dataset/base.py", line 93, in _get_answer_vectors
    vector[idx, :] = self._encode_answer_vector(self.index_to_answer[answer_id])
KeyError: tensor(0)

hexiang-hu · 2020-08-25T06:32:56Z

This is because the type of ``answer_id'' is a tensor rather than a int. I think you can change it to:

vector[idx, :] = self._encode_answer_vector(self.index_to_answer[answer_id.item()])

hackerchenzhuo · 2020-08-25T06:55:50Z

train E000:   0% 0/3467 [00:07<?, ?it/s]
Traceback (most recent call last):
  File "train_vqa_embedding.py", line 266, in <module>
    main(args)
  File "train_vqa_embedding.py", line 239, in main
    train(context_net, answer_net, train_loader, optimizer, tracker, args, prefix='train', epoch=i)
  File "train_vqa_embedding.py", line 133, in train
    acc = utils.batch_accuracy(predicts.data, a.data).cpu()
  File "/home/answer_embedding-master/ansemb/utils.py", line 27, in batch_accuracy
    agreeing = true.gather(dim=1, index=predicted_index)
RuntimeError: invalid argument 4: Index tensor must have same dimensions as input tensor at /pytorch/aten/src/THC/generic/THCTensorScatterGather.cu:16

it seems that some error raised during the computing of acc

hexiang-hu · 2020-08-25T07:43:02Z

This one is also due to the API change of Pytorch.

You can check the shape of true'' and predicted_index'', to make sure they have the same dimensionality.

Please refer to this https://pytorch.org/docs/stable/generated/torch.gather.html.

Check some examples about how to use ``torch.gather'' on stackoverflow would be helpful for you to debug.

hackerchenzhuo · 2020-08-25T08:13:44Z

yes they dont have the same dimensionality

shape of true: torch.Size([1, 128, 3000])
type of true: <class 'torch.Tensor'>
shape of predicted_index: torch.Size([128, 1])
type of predicted_index: <class 'torch.Tensor'>

hackerchenzhuo · 2020-08-25T09:33:31Z

It work!
after i change the
agreeing = true.gather(dim=1, index=predicted_index)
into
agreeing = true[0].gather(dim=1, index=predicted_index)
And fix up some small problem.

Thank you so much again! :)

hexiang-hu closed this as completed Aug 27, 2020

Repository owner deleted a comment from hackerchenzhuo Aug 27, 2020

hackerchenzhuo mentioned this issue Jan 10, 2023

how to evaluate my own data China-UK-ZSL/ZS-F-VQA#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

something wrong with "answer_vectors = default_collate(_answer_vectors)" in ansemb/dataset/data_utils.py #3

something wrong with "answer_vectors = default_collate(_answer_vectors)" in ansemb/dataset/data_utils.py #3

hackerchenzhuo commented Aug 23, 2020 •

edited

Loading

hackerchenzhuo commented Aug 23, 2020

hexiang-hu commented Aug 25, 2020

hackerchenzhuo commented Aug 25, 2020 •

edited

Loading

hackerchenzhuo commented Aug 25, 2020 •

edited

Loading

hexiang-hu commented Aug 25, 2020

hackerchenzhuo commented Aug 25, 2020 •

edited

Loading

hexiang-hu commented Aug 25, 2020 •

edited

Loading

hackerchenzhuo commented Aug 25, 2020

hexiang-hu commented Aug 25, 2020

hackerchenzhuo commented Aug 25, 2020

hackerchenzhuo commented Aug 25, 2020

something wrong with "answer_vectors = default_collate(_answer_vectors)" in ansemb/dataset/data_utils.py #3

something wrong with "answer_vectors = default_collate(_answer_vectors)" in ansemb/dataset/data_utils.py #3

Comments

hackerchenzhuo commented Aug 23, 2020 • edited Loading

hackerchenzhuo commented Aug 23, 2020

hexiang-hu commented Aug 25, 2020

hackerchenzhuo commented Aug 25, 2020 • edited Loading

hackerchenzhuo commented Aug 25, 2020 • edited Loading

this is the printed text before:

hexiang-hu commented Aug 25, 2020

hackerchenzhuo commented Aug 25, 2020 • edited Loading

hexiang-hu commented Aug 25, 2020 • edited Loading

hackerchenzhuo commented Aug 25, 2020

hexiang-hu commented Aug 25, 2020

hackerchenzhuo commented Aug 25, 2020

hackerchenzhuo commented Aug 25, 2020

hackerchenzhuo commented Aug 23, 2020 •

edited

Loading

hackerchenzhuo commented Aug 25, 2020 •

edited

Loading

hackerchenzhuo commented Aug 25, 2020 •

edited

Loading

hackerchenzhuo commented Aug 25, 2020 •

edited

Loading

hexiang-hu commented Aug 25, 2020 •

edited

Loading