Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add verifiability judgment scenario #1518

Merged
merged 8 commits into from
May 13, 2023

Conversation

nelson-liu
Copy link
Contributor

Added a scenario for verifiability judgment---given a generated statement and a cited source, predict if the source fully, partially, or does not support the statement.

Running: helm-run -r verifiability_judgment:model=openai/gpt-4-0314 --max-eval-instances 1 --suite 1

I'll add the metrics once I have them, but figured I'd open this PR for now.

@nelson-liu
Copy link
Contributor Author

Some of the instances here are particularly long, since webpages can be lengthy. Is HELM smart about automatically truncating things, or is there something else I need to do on the scenario-side?

Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline: perhaps we can have a max_num_words argument in the scenario, which filters out instances over the limit.

Comment on lines +76 to +79
"complete_support": "fully supports",
"partial_support": "partially supports",
"no_support": "does not support",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale of the aliasing, as opposed to using the original word forms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought it'd be more natural for the LM to generate "fully supports" as opposed to "complete_support".

@yifanmai
Copy link
Collaborator

@nelson-liu would you have time to get the checks working and merge this? I think you just need to run mypy src. Let me know if you'd like me to take over instead.

@nelson-liu nelson-liu force-pushed the judge_verifiability branch from fcb7acb to 57fcb32 Compare May 13, 2023 01:55
@yifanmai yifanmai merged commit 9652316 into stanford-crfm:main May 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants