Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation of Finetuned Model on SthV2 dataset Got Extremely Low Performance #146

Open
caidonkey opened this issue Jul 16, 2024 · 1 comment
Assignees

Comments

@caidonkey
Copy link

Thank you for your great work!

I downloaded the finetuned model provided in your model zoo: https://huggingface.co/OpenGVLab/InternVideo2-Stage1-1B-224p-f8-thSth/blob/main/1B_ft_ssv2_f8.pth (with 77.1% topp-1 accuracy reported on SthV2) and prepared the dataset SthV2 according to your instructions (though may be a bit vague).

And I evaluated the model using most of the parameters provided in the script: https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/single_modality/scripts/finetuning/full_tuning/ssv2/1B_ft_ssv2_f8.sh as below:

python run_finetuning.py
--model
internvideo2_1B_patch14_224
--data_path
[our data path]
--prefix
[our data path]
--data_set
SSV2
--filename_tmpl
img_{:05}.jpg
--no_use_decord
--nb_classes
174
--finetune
[our path]/OpenGVLab--InternVideo2-Stage1-1B-224p-f8-SthSth/1B_ft_ssv2_f8.pth
--log_dir
[our path]/logs/1B_ft_ssv2_f8
--output_dir
[our path]/1B_ft_ssv2_f8
--batch_size
8
--num_sample
2
--input_size
224
--short_side_size
224
--save_ckpt_freq
100
--num_frames
8
--num_workers
12
--warmup_epochs
3
--tubelet_size
1
--epochs
8
--lr
1e-4
--drop_path
0.3
--layer_decay
0.915
--use_checkpoint
--checkpoint_num
6
--layer_scale_init_value
1e-5
--opt
adamw
--opt_betas
0.9
0.999
--weight_decay
0.05
--test_num_segment
2
--test_num_crop
3
--dist_eval
--enable_deepspeed
--bf16
--zero_stage
1
--test_best
--eval

With raw images or videos as input, we both got extreme low evaluation results (0.59% top-1 and 2.80% top-5 accuracies using raw images as input).

Would you kindly help to check what might be the reason? Is it a problem with dataset preparation or parameter configurations?

Thank you very much for your time.

@caidonkey
Copy link
Author

The top-1 accuracy of 0.59% seems to be a random guess (1/174).

I also evaluated ssv1 model (https://huggingface.co/OpenGVLab/InternVideo2-Stage1-1B-224p-f8-SthSth/blob/main/1B_ft_ssv1_f8.pth) on SSV1 dataset. The top-1 and top-5 accuracies are 0.50% and 2.42%.

I have checked the model weights loaded and compared them with the weights in the evaluation forward path (fc_norm & head) and they are the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants