Wrong results in Action Recognition task. #133

zhengrongz · 2024-06-01T10:39:52Z

Hi!
I have tried Internvideo2-1B-clip in the action recognition task on K400 dataset, I try to use the model without the dataset class you designed.
So what I do in vision is catching 8 frames from video, transform it using test_transform, feed the processed clip into the vision encoder to get the 1x768 feature.
In text I just use the k400_categories.txt and kinetics_prompt you offered, after the text encoder it's 400x16x768 features.
Finally I get these two features in get_sim, and get a rank of the categories, but the result is very bad.
the answer is always not in the top5 choices, the model seems to randomly rank the categories.
I don't know if there is any wrong.
the model I use is chinese_alpaca_lora_7b, InternVideo2-stage2_1b-224p-f4.pt, internvl_c_13b_224px.pth, InternVideo2_CLIP_1B.pth.

zhengrongz · 2024-06-01T10:53:01Z

I also don't use flashattn, deepspeed,fused_rmsnorm and fused_mlp, but I don't think it will influence the inference result.

Andy1621 · 2024-06-01T16:04:02Z

Hi! Can you try to reproduce the results for some small dataset like UCF101? Thus you can check whether you have load the weights correctly.

zhengrongz changed the title ~~Wrong~~ Wrong results in Action Recognition task. Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong results in Action Recognition task. #133

Wrong results in Action Recognition task. #133

zhengrongz commented Jun 1, 2024 •

edited

Loading

zhengrongz commented Jun 1, 2024

Andy1621 commented Jun 1, 2024

Wrong results in Action Recognition task. #133

Wrong results in Action Recognition task. #133

Comments

zhengrongz commented Jun 1, 2024 • edited Loading

zhengrongz commented Jun 1, 2024

Andy1621 commented Jun 1, 2024

zhengrongz commented Jun 1, 2024 •

edited

Loading