Add verifiability judgment scenario #1518

nelson-liu · 2023-04-28T00:23:41Z

Added a scenario for verifiability judgment---given a generated statement and a cited source, predict if the source fully, partially, or does not support the statement.

Running: helm-run -r verifiability_judgment:model=openai/gpt-4-0314 --max-eval-instances 1 --suite 1

I'll add the metrics once I have them, but figured I'd open this PR for now.

src/helm/benchmark/scenarios/verifiability_judgment_scenario.py

nelson-liu · 2023-04-28T04:11:42Z

Some of the instances here are particularly long, since webpages can be lengthy. Is HELM smart about automatically truncating things, or is there something else I need to do on the scenario-side?

yifanmai

As discussed offline: perhaps we can have a max_num_words argument in the scenario, which filters out instances over the limit.

yifanmai · 2023-04-28T23:44:09Z

src/helm/benchmark/scenarios/verifiability_judgment_scenario.py

+        "complete_support": "fully supports",
+        "partial_support": "partially supports",
+        "no_support": "does not support",


What's the rationale of the aliasing, as opposed to using the original word forms?

Oh, I thought it'd be more natural for the LM to generate "fully supports" as opposed to "complete_support".

src/helm/benchmark/scenarios/verifiability_judgment_scenario.py

yifanmai · 2023-05-13T00:06:17Z

@nelson-liu would you have time to get the checks working and merge this? I think you just need to run mypy src. Let me know if you'd like me to take over instead.

percyliang reviewed Apr 28, 2023

View reviewed changes

src/helm/benchmark/scenarios/verifiability_judgment_scenario.py Outdated Show resolved Hide resolved

yifanmai approved these changes Apr 29, 2023

View reviewed changes

nelson-liu added 5 commits May 12, 2023 18:55

Add verifiability judgment scenario

aecefcb

Point to hosted file

2a860aa

Combine source texts with space vs. newline

a558ac5

fix key for source_author

32a2884

Update for new data

57fcb32

nelson-liu force-pushed the judge_verifiability branch from fcb7acb to 57fcb32 Compare May 13, 2023 01:55

nelson-liu and others added 3 commits May 12, 2023 18:57

Remove spuroius change

e5ee1ff

Fix gzip handling, construction of prompt

8e8f20e

Fix flake8 error

57a6bac

yifanmai merged commit 9652316 into stanford-crfm:main May 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add verifiability judgment scenario #1518

Add verifiability judgment scenario #1518

nelson-liu commented Apr 28, 2023

nelson-liu commented Apr 28, 2023

yifanmai left a comment

yifanmai Apr 28, 2023

nelson-liu Apr 29, 2023

yifanmai commented May 13, 2023

Add verifiability judgment scenario #1518

Add verifiability judgment scenario #1518

Conversation

nelson-liu commented Apr 28, 2023

nelson-liu commented Apr 28, 2023

yifanmai left a comment

Choose a reason for hiding this comment

yifanmai Apr 28, 2023

Choose a reason for hiding this comment

nelson-liu Apr 29, 2023

Choose a reason for hiding this comment

yifanmai commented May 13, 2023