Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for finetuned InternVideo2-1B results on video retrieval benchmarks #136

Open
roberto-amoroso opened this issue Jun 11, 2024 · 13 comments

Comments

@roberto-amoroso
Copy link

Hi, great work and thanks for releasing the code. In Table 10 of your InternVideo2 paper, you reported the results of finetuning video retrieval in both T2V and V2T on MSR-VTT, LSMDC, DiDeMo, MSVD, ActivityNet, and VATEX for the 6B model.

Could you please provide the results for the finetuned InternVideo2-1B model as well?

This would be very helpful for literature comparisons with models of similar size.

Thanks a lot

@roberto-amoroso roberto-amoroso changed the title Request for InternVideo2-1B results on video retrieval benchmarks Request for finetuned InternVideo2-1B results on video retrieval benchmarks Jun 11, 2024
@Andy1621
Copy link
Collaborator

Hi! Considering the cost for diverse downstream datasets, we only provide the zero-shot results~

@nsreeprem
Copy link

@roberto-amoroso where you able to obtain the authors results for MSRVTT (zero shot) ?

@roberto-amoroso
Copy link
Author

Hi! Considering the cost for diverse downstream datasets, we only provide the zero-shot results~

@Andy1621 Thanks for your reply. Yes, I am aware that finetuning the model could be expensive, so I was hoping you have some internal results of your 1B model finetuned on MSRVTT that you could share... thanks anyway

@roberto-amoroso
Copy link
Author

@roberto-amoroso where you able to obtain the authors results for MSRVTT (zero shot) ?

@nsreeprem do you mean if I was able to reproduce the 0-shot performance presented in Table 9 of the paper?

@nsreeprem
Copy link

nsreeprem commented Jun 12, 2024

@roberto-amoroso yes, I meant to ask if you were able to reproduce the results for zero-shot R@1. I am finding close to 47% (~5% lower) performance (R@1) than what is mentioned in the Table 9.

@roberto-amoroso
Copy link
Author

@nsreeprem The 0-shot performance I measured on MSRVTT by using the s2-1B model is 51.8 (0.1% lower) for T2V R@1 and 49.3 (1.6% lower) for V2T R@1. These results are obtained by considering the ITM re-ranking stage (i.e., what is called msrvtt_1k_test_match in the metrics log)

@roberto-amoroso
Copy link
Author

@Andy1621 did you use the 9k or 7k MSR-VTT train split for finetuning the 6B model (Table 10)?

@Andy1621
Copy link
Collaborator

We follow Unmaked Teacher to finetune it the downstream tasks.

@pribadihcr
Copy link

s2-1B model

where to get this model? thanks

@leexinhao
Copy link
Collaborator

Hi! Considering the cost for diverse downstream datasets, we only provide the zero-shot results~

@Andy1621 Thanks for your reply. Yes, I am aware that finetuning the model could be expensive, so I was hoping you have some internal results of your 1B model finetuned on MSRVTT that you could share... thanks anyway

The ft performance gap of 1B compared to 6B is close to the zs performance gap, you can estimate it.

@nsreeprem
Copy link

@Andy1621 @leexinhao would you be releasing the hyperparameters for finetuning the 1B or 6B model?

@leexinhao
Copy link
Collaborator

@Andy1621 @leexinhao would you be releasing the hyperparameters for finetuning the 1B or 6B model?

You could refer to https://github.com/OpenGVLab/unmasked_teacher, we use similar hyperparameters except using deepspeed.

@haoyi199815
Copy link

@nsreeprem The 0-shot performance I measured on MSRVTT by using the s2-1B model is 51.8 (0.1% lower) for T2V R@1 and 49.3 (1.6% lower) for V2T R@1. These results are obtained by considering the ITM re-ranking stage (i.e., what is called msrvtt_1k_test_match in the metrics log)

Are you used the hyperparamers as reported in the config.py? I get a big performance gap between my reproduce version and the InternVideo2-1B-stage2-f4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants