-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MMMU scenario and support multimodal multiple choice adaptation #2259
Conversation
…into vlm_models
…into vlm_models
…into heim_human
…into heim_human
options: List[str] = row["options"] | ||
answer: str = row["answer"] | ||
|
||
# Create the question. Questions can have text and images interleaved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I am pretty sure in the PR for Llava and OpenFlamingo we assumed the image was at the top
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can evaluate llava and openflamingo with MMMU. There can be multiple images (up to 7) in the question and even in the answer choices.
@JosselinSomervilleRoberts I answered your concerns. Let me know if something is unclear! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, just make sure to make the change when loading the huggingface dataset, otherwise looks good!
Resolves #2114, #2068