Request for finetuned InternVideo2-1B results on video retrieval benchmarks #136

roberto-amoroso · 2024-06-11T19:34:47Z

Hi, great work and thanks for releasing the code. In Table 10 of your InternVideo2 paper, you reported the results of finetuning video retrieval in both T2V and V2T on MSR-VTT, LSMDC, DiDeMo, MSVD, ActivityNet, and VATEX for the 6B model.

Could you please provide the results for the finetuned InternVideo2-1B model as well?

This would be very helpful for literature comparisons with models of similar size.

Thanks a lot

Andy1621 · 2024-06-12T03:08:08Z

Hi! Considering the cost for diverse downstream datasets, we only provide the zero-shot results~

nsreeprem · 2024-06-12T06:05:05Z

@roberto-amoroso where you able to obtain the authors results for MSRVTT (zero shot) ?

roberto-amoroso · 2024-06-12T09:18:05Z

Hi! Considering the cost for diverse downstream datasets, we only provide the zero-shot results~

@Andy1621 Thanks for your reply. Yes, I am aware that finetuning the model could be expensive, so I was hoping you have some internal results of your 1B model finetuned on MSRVTT that you could share... thanks anyway

roberto-amoroso · 2024-06-12T09:18:12Z

@roberto-amoroso where you able to obtain the authors results for MSRVTT (zero shot) ?

@nsreeprem do you mean if I was able to reproduce the 0-shot performance presented in Table 9 of the paper?

nsreeprem · 2024-06-12T10:02:25Z

@roberto-amoroso yes, I meant to ask if you were able to reproduce the results for zero-shot R@1. I am finding close to 47% (~5% lower) performance (R@1) than what is mentioned in the Table 9.

roberto-amoroso · 2024-06-12T10:25:51Z

@nsreeprem The 0-shot performance I measured on MSRVTT by using the s2-1B model is 51.8 (0.1% lower) for T2V R@1 and 49.3 (1.6% lower) for V2T R@1. These results are obtained by considering the ITM re-ranking stage (i.e., what is called msrvtt_1k_test_match in the metrics log)

roberto-amoroso · 2024-06-12T11:59:09Z

@Andy1621 did you use the 9k or 7k MSR-VTT train split for finetuning the 6B model (Table 10)?

Andy1621 · 2024-06-12T12:39:49Z

We follow Unmaked Teacher to finetune it the downstream tasks.

pribadihcr · 2024-06-20T08:00:53Z

s2-1B model

where to get this model? thanks

leexinhao · 2024-06-26T04:08:36Z

Hi! Considering the cost for diverse downstream datasets, we only provide the zero-shot results~

@Andy1621 Thanks for your reply. Yes, I am aware that finetuning the model could be expensive, so I was hoping you have some internal results of your 1B model finetuned on MSRVTT that you could share... thanks anyway

The ft performance gap of 1B compared to 6B is close to the zs performance gap, you can estimate it.

nsreeprem · 2024-06-26T08:44:47Z

@Andy1621 @leexinhao would you be releasing the hyperparameters for finetuning the 1B or 6B model?

leexinhao · 2024-08-16T07:50:21Z

@Andy1621 @leexinhao would you be releasing the hyperparameters for finetuning the 1B or 6B model?

You could refer to https://github.com/OpenGVLab/unmasked_teacher, we use similar hyperparameters except using deepspeed.

haoyi199815 · 2024-08-20T07:23:09Z

@nsreeprem The 0-shot performance I measured on MSRVTT by using the s2-1B model is 51.8 (0.1% lower) for T2V R@1 and 49.3 (1.6% lower) for V2T R@1. These results are obtained by considering the ITM re-ranking stage (i.e., what is called msrvtt_1k_test_match in the metrics log)

Are you used the hyperparamers as reported in the config.py? I get a big performance gap between my reproduce version and the InternVideo2-1B-stage2-f4

roberto-amoroso changed the title ~~Request for InternVideo2-1B results on video retrieval benchmarks~~ Request for finetuned InternVideo2-1B results on video retrieval benchmarks Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for finetuned InternVideo2-1B results on video retrieval benchmarks #136

Request for finetuned InternVideo2-1B results on video retrieval benchmarks #136

roberto-amoroso commented Jun 11, 2024

Andy1621 commented Jun 12, 2024

nsreeprem commented Jun 12, 2024

roberto-amoroso commented Jun 12, 2024

roberto-amoroso commented Jun 12, 2024

nsreeprem commented Jun 12, 2024 •

edited

Loading

roberto-amoroso commented Jun 12, 2024

roberto-amoroso commented Jun 12, 2024

Andy1621 commented Jun 12, 2024

pribadihcr commented Jun 20, 2024

leexinhao commented Jun 26, 2024

nsreeprem commented Jun 26, 2024

leexinhao commented Aug 16, 2024

haoyi199815 commented Aug 20, 2024

Request for finetuned InternVideo2-1B results on video retrieval benchmarks #136

Request for finetuned InternVideo2-1B results on video retrieval benchmarks #136

Comments

roberto-amoroso commented Jun 11, 2024

Andy1621 commented Jun 12, 2024

nsreeprem commented Jun 12, 2024

roberto-amoroso commented Jun 12, 2024

roberto-amoroso commented Jun 12, 2024

nsreeprem commented Jun 12, 2024 • edited Loading

roberto-amoroso commented Jun 12, 2024

roberto-amoroso commented Jun 12, 2024

Andy1621 commented Jun 12, 2024

pribadihcr commented Jun 20, 2024

leexinhao commented Jun 26, 2024

nsreeprem commented Jun 26, 2024

leexinhao commented Aug 16, 2024

haoyi199815 commented Aug 20, 2024

nsreeprem commented Jun 12, 2024 •

edited

Loading