Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluate_activitynet_qa #115

Open
rixejzvdl649 opened this issue Jul 3, 2024 · 5 comments
Open

evaluate_activitynet_qa #115

rixejzvdl649 opened this issue Jul 3, 2024 · 5 comments

Comments

@rixejzvdl649
Copy link

evaluate_activitynet_qa

v_iKclcQEl4zI_10

{'q': 'what is the safety factor of the flip', 'a': 'secondary', 'pred': 'The safety factor of the flip-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-backward-forward-back'}
@mmaaz60
Copy link
Member

mmaaz60 commented Jul 8, 2024

Hi @rixejzvdl649,

I appreciate your interest in our work. Please provide some more information on how can we help. Is this the output generated by Video-ChatGPT? Thank You.

@YoungjaeDev
Copy link

@mmaaz60
activity_qa accuracy is only showing 15% for me, which is not close to the paper's 35%, what is wrong with the code?

@hb-jw
Copy link

hb-jw commented Jul 26, 2024

Hello, I've also been replicating related benchmarks recently, and these benchmarks are mostly based on GPT-assistant, which seems quite costly. I'd like to ask, approximately how much does each of your evaluations cost?

@YoungjaeDev
Copy link

@hb-jw
If you're considering the gpt4o-mini, I don't think it's going to cost much. under $10?

@hb-jw
Copy link

hb-jw commented Jul 28, 2024

@hb-jw If you're considering the gpt4o-mini, I don't think it's going to cost much. under $10?

Thank you for your reply! I am using the GPT-3.5 Turbo API, and I have only tested 200 questions from the MSVD-QA in the zero-shot QA setup, which comprises about 1500 samples in total, and it has already cost me $0.90. Based on this calculation, testing the entire MSVD-QA would require $6. However, all the benchmarks (zero-shot QA + videochatgpt benchmark) require 9 similar tests. What are your thoughts on this? Could there be an error in my calculations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants