Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'gqa_accuracy_answer_total_unscaled' #47

Open
TopCoder2K opened this issue Sep 27, 2021 · 3 comments
Open

KeyError: 'gqa_accuracy_answer_total_unscaled' #47

TopCoder2K opened this issue Sep 27, 2021 · 3 comments

Comments

@TopCoder2K
Copy link

This mistake is really strange... I follow the readme for training MDETR on CLEVR.
Firstly, I've ran the following command:

python run_with_submitit.py --dataset_config configs/clevr_pretrain.json --backbone "resnet18" --num_queries 25 --batch_size 64  --schedule linear_with_warmup --text_encoder_type distilroberta-base --output-dir step1 --epochs 5 --lr_drop 20 --nodes 1 --ngpus 1

The only difference with the one in the readme is that I've used run_with_submitit.py and added --nodes 1 --ngpus 1 parameters.
The training has gone well and the job has finished successfully. Then I've ran

python run_with_submitit.py --dataset_config configs/clevr.json --backbone "resnet18" --num_queries 25 --batch_size 64  --schedule linear_with_warmup --text_encoder_type distilroberta-base --output-dir step2 --load ~/MDETR/mdetr/checkpoint/pchelintsev/experiments/19906/BEST_checkpoint.pth --epochs 5 --lr_drop 20 --nodes 1 --ngpus 1

And after the first epoch and testing I've gotten the following in 28574_0_log.err file (warnings were deleted):

submitit ERROR (2021-09-27 13:01:24,999) - Submitted job triggered an exception
Traceback (most recent call last):
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/_submit.py", line 11, in <module>
    submitit_main()
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/submission.py", line 71, in submitit_main
    process_job(args.folder)
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/submission.py", line 64, in process_job
    raise error
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/submission.py", line 53, in process_job
    result = delayed.result()
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/utils.py", line 128, in result
    self._result = self.function(*self.args, **self.kwargs)
  File "run_with_submitit.py", line 98, in __call__
    detection.main(self.args)
  File "/home/pchelintsev/MDETR/mdetr/main.py", line 614, in main
    metric = test_stats["gqa_accuracy_answer_total_unscaled"]
KeyError: 'gqa_accuracy_answer_total_unscaled'

Why the loss is missing?((
Also, here is the end of 28574_0_log.out file:

Accumulating evaluation results...
DONE (t=70.57s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.581
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.893
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.660
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.374
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.578
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.768
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.302
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.729
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.741
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.637
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.741
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.842
submitit ERROR (2021-09-27 13:01:24,999) - Submitted job triggered an exception
@TopCoder2K
Copy link
Author

TopCoder2K commented Sep 27, 2021

I noticed a strange moment in main.py. Likely, this is not the reason, but this might help in searching the mistake.
This is the line:
image
But here we can see that in case of CLEVR updating works only for 'clevr_something' keys (because in the config file we've put only ["clevr"])
image
So gqa_accuracy_answer_total_unscaled cannot emerge...

@tchiwewe
Copy link

@TopCoder2K Did you manage to find a solution to this?

@tchiwewe
Copy link

Not sure why this is hardcoded. When doing QA, the list of keys (metrics) available when using the CLEVR dataset is as shown below. Changing 'gqa_accuracy_answer_total_unscaled' to 'clevr_accuracy_answer_total_unscaled' in the code should fix the problem.

dict_keys([
 'clevr_loss',
 'clevr_loss_ce',
 'clevr_loss_bbox',
 'clevr_loss_giou',
 'clevr_loss_contrastive_align',
 'clevr_loss_ce_0',
 'clevr_loss_bbox_0',
 'clevr_loss_giou_0',
 'clevr_loss_contrastive_align_0',
 'clevr_loss_ce_1',
 'clevr_loss_bbox_1',
 'clevr_loss_giou_1',
 'clevr_loss_contrastive_align_1',
 'clevr_loss_ce_2',
 'clevr_loss_bbox_2',
 'clevr_loss_giou_2',
 'clevr_loss_contrastive_align_2',
 'clevr_loss_ce_3',
 'clevr_loss_bbox_3',
 'clevr_loss_giou_3',
 'clevr_loss_contrastive_align_3',
 'clevr_loss_ce_4',
 'clevr_loss_bbox_4',
 'clevr_loss_giou_4',
 'clevr_loss_contrastive_align_4',
 'clevr_loss_answer_type',
 'clevr_loss_answer_binary',
 'clevr_loss_answer_reg',
 'clevr_loss_answer_attr',
 'clevr_loss_ce_unscaled',
 'clevr_loss_bbox_unscaled',
 'clevr_loss_giou_unscaled',
 'clevr_cardinality_error_unscaled',
 'clevr_loss_contrastive_align_unscaled',
 'clevr_loss_ce_0_unscaled',
 'clevr_loss_bbox_0_unscaled',
 'clevr_loss_giou_0_unscaled',
 'clevr_cardinality_error_0_unscaled',
 'clevr_loss_contrastive_align_0_unscaled',
 'clevr_loss_ce_1_unscaled',
 'clevr_loss_bbox_1_unscaled',
 'clevr_loss_giou_1_unscaled',
 'clevr_cardinality_error_1_unscaled',
 'clevr_loss_contrastive_align_1_unscaled',
 'clevr_loss_ce_2_unscaled',
 'clevr_loss_bbox_2_unscaled',
 'clevr_loss_giou_2_unscaled',
 'clevr_cardinality_error_2_unscaled',
 'clevr_loss_contrastive_align_2_unscaled',
 'clevr_loss_ce_3_unscaled',
 'clevr_loss_bbox_3_unscaled',
 'clevr_loss_giou_3_unscaled',
 'clevr_cardinality_error_3_unscaled',
 'clevr_loss_contrastive_align_3_unscaled',
 'clevr_loss_ce_4_unscaled',
 'clevr_loss_bbox_4_unscaled',
 'clevr_loss_giou_4_unscaled',
 'clevr_cardinality_error_4_unscaled',
 'clevr_loss_contrastive_align_4_unscaled',
 'clevr_loss_answer_type_unscaled',
 'clevr_accuracy_answer_type_unscaled',
 'clevr_loss_answer_binary_unscaled',
 'clevr_accuracy_answer_binary_unscaled',
 'clevr_loss_answer_reg_unscaled',
 'clevr_accuracy_answer_reg_unscaled',
 'clevr_loss_answer_attr_unscaled',
 'clevr_accuracy_answer_attr_unscaled',
 'clevr_accuracy_answer_total_unscaled',
 'clevr_coco_eval_bbox'
])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants