VHELM improvements #2825

teetone · 2024-07-22T07:08:44Z

Changes

New scenarios: RealWorldQA (https://x.ai/blog/grok-1.5v), EXAMS-V (https://arxiv.org/abs/2403.10378), FairFace (https://arxiv.org/abs/1908.04913)
Handle content blocked error for Palmyra Vision
Include authors for Scenario descriptions for the results website
Add VISION_LANGUAGE_MODEL_TAG tag to openai/gpt-4-turbo-2024-04-09

…ge2structfix

… update scenario descriptions

src/helm/benchmark/run_specs/vlm_run_specs.py

src/helm/benchmark/scenarios/vision_language/exams_v_scenario.py

src/helm/benchmark/scenarios/vision_language/fair_face_scenario.py

yifanmai · 2024-07-29T20:23:28Z

src/helm/benchmark/scenarios/vision_language/exams_v_scenario.py

+
+                # Save the image to disk
+                image = row["image"]
+                image_file_name: str = generate_hash(image) + ".jpg"


Why not just name this f"{split}_{row_index}.jpg"? Then we won't need hashes.

I want a unique hash per image because I worried the underlying huggingface dataset could get reshuffled.

You can deal with this by passing the githash e.g. revision="e9488045cbad16c973f031c7a8f7466b5dcc3794" to load_dataset(), then you don't have to worry about mutations. We should probably do this for all other usages of load_dataset(), also.

yifanmai

OK to land, remaining comments are optional.

teetone added 5 commits June 25, 2024 19:40

update image2struct schema

95278b6

update website

420ef6c

Merge branch 'main' of https://github.com/stanford-crfm/helm into ima…

dc1a0f6

…ge2structfix

New VHELM scenarios, handle blocked content error for Palmyra Vision,…

336d477

… update scenario descriptions

resolve merge conflict

8d28984

teetone added the VHELM Holistic Evaluation of Vision-Language Models (VLM) label Jul 22, 2024

teetone requested review from yifanmai, percyliang and chiheem July 22, 2024 07:08

make gpt-4 turbo vlm

ab3ad57

yifanmai requested changes Jul 29, 2024

View reviewed changes

teetone requested a review from yifanmai July 31, 2024 04:36

yifanmai approved these changes Jul 31, 2024

View reviewed changes

teetone merged commit d3397d0 into main Aug 1, 2024
9 checks passed

teetone deleted the image2structfix branch August 1, 2024 09:48

This was referenced Aug 5, 2024

GQA Scenario #1951

Closed

A-OKVQA Scenario #1952

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VHELM improvements #2825

VHELM improvements #2825

teetone commented Jul 22, 2024 •

edited

Loading

yifanmai Jul 29, 2024 •

edited

Loading

teetone Jul 31, 2024

yifanmai Jul 31, 2024

yifanmai left a comment

VHELM improvements #2825

VHELM improvements #2825

Conversation

teetone commented Jul 22, 2024 • edited Loading

Changes

yifanmai Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

teetone Jul 31, 2024

Choose a reason for hiding this comment

yifanmai Jul 31, 2024

Choose a reason for hiding this comment

yifanmai left a comment

Choose a reason for hiding this comment

teetone commented Jul 22, 2024 •

edited

Loading

yifanmai Jul 29, 2024 •

edited

Loading