Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong results in Action Recognition task. #133

Open
zhengrongz opened this issue Jun 1, 2024 · 2 comments
Open

Wrong results in Action Recognition task. #133

zhengrongz opened this issue Jun 1, 2024 · 2 comments

Comments

@zhengrongz
Copy link

zhengrongz commented Jun 1, 2024

Hi!
I have tried Internvideo2-1B-clip in the action recognition task on K400 dataset, I try to use the model without the dataset class you designed.
So what I do in vision is catching 8 frames from video, transform it using test_transform, feed the processed clip into the vision encoder to get the 1x768 feature.
In text I just use the k400_categories.txt and kinetics_prompt you offered, after the text encoder it's 400x16x768 features.
Finally I get these two features in get_sim, and get a rank of the categories, but the result is very bad.
the answer is always not in the top5 choices, the model seems to randomly rank the categories.
I don't know if there is any wrong.
the model I use is chinese_alpaca_lora_7b, InternVideo2-stage2_1b-224p-f4.pt, internvl_c_13b_224px.pth, InternVideo2_CLIP_1B.pth.

@zhengrongz zhengrongz changed the title Wrong Wrong results in Action Recognition task. Jun 1, 2024
@zhengrongz
Copy link
Author

I also don't use flashattn, deepspeed,fused_rmsnorm and fused_mlp, but I don't think it will influence the inference result.

@Andy1621
Copy link
Collaborator

Andy1621 commented Jun 1, 2024

Hi! Can you try to reproduce the results for some small dataset like UCF101? Thus you can check whether you have load the weights correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants