Evaluation of Finetuned Model on SthV2 dataset Got Extremely Low Performance #146

caidonkey · 2024-07-16T08:54:53Z

Thank you for your great work!

I downloaded the finetuned model provided in your model zoo: https://huggingface.co/OpenGVLab/InternVideo2-Stage1-1B-224p-f8-thSth/blob/main/1B_ft_ssv2_f8.pth (with 77.1% topp-1 accuracy reported on SthV2) and prepared the dataset SthV2 according to your instructions (though may be a bit vague).

And I evaluated the model using most of the parameters provided in the script: https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/single_modality/scripts/finetuning/full_tuning/ssv2/1B_ft_ssv2_f8.sh as below:

python run_finetuning.py
--model
internvideo2_1B_patch14_224
--data_path
[our data path]
--prefix
[our data path]
--data_set
SSV2
--filename_tmpl
img_{:05}.jpg
--no_use_decord
--nb_classes
174
--finetune
[our path]/OpenGVLab--InternVideo2-Stage1-1B-224p-f8-SthSth/1B_ft_ssv2_f8.pth
--log_dir
[our path]/logs/1B_ft_ssv2_f8
--output_dir
[our path]/1B_ft_ssv2_f8
--batch_size
8
--num_sample
2
--input_size
224
--short_side_size
224
--save_ckpt_freq
100
--num_frames
8
--num_workers
12
--warmup_epochs
3
--tubelet_size
1
--epochs
8
--lr
1e-4
--drop_path
0.3
--layer_decay
0.915
--use_checkpoint
--checkpoint_num
6
--layer_scale_init_value
1e-5
--opt
adamw
--opt_betas
0.9
0.999
--weight_decay
0.05
--test_num_segment
2
--test_num_crop
3
--dist_eval
--enable_deepspeed
--bf16
--zero_stage
1
--test_best
--eval

With raw images or videos as input, we both got extreme low evaluation results (0.59% top-1 and 2.80% top-5 accuracies using raw images as input).

Would you kindly help to check what might be the reason? Is it a problem with dataset preparation or parameter configurations?

Thank you very much for your time.

caidonkey · 2024-07-19T02:44:31Z

The top-1 accuracy of 0.59% seems to be a random guess (1/174).

I also evaluated ssv1 model (https://huggingface.co/OpenGVLab/InternVideo2-Stage1-1B-224p-f8-SthSth/blob/main/1B_ft_ssv1_f8.pth) on SSV1 dataset. The top-1 and top-5 accuracies are 0.50% and 2.42%.

I have checked the model weights loaded and compared them with the weights in the evaluation forward path (fc_norm & head) and they are the same.

yinanhe assigned Andy1621 Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation of Finetuned Model on SthV2 dataset Got Extremely Low Performance #146

Evaluation of Finetuned Model on SthV2 dataset Got Extremely Low Performance #146

caidonkey commented Jul 16, 2024

caidonkey commented Jul 19, 2024

Evaluation of Finetuned Model on SthV2 dataset Got Extremely Low Performance #146

Evaluation of Finetuned Model on SthV2 dataset Got Extremely Low Performance #146

Comments

caidonkey commented Jul 16, 2024

caidonkey commented Jul 19, 2024