OpinionsQA #1424

ShibaniSanturkar · 2023-03-20T00:17:16Z

Merge the OpinionsQA dataset into HELM.

percyliang · 2023-03-20T06:07:46Z

src/helm/benchmark/adaptation/adapter_spec.py

@@ -68,6 +68,9 @@ class AdapterSpec:
    # set of training instances.  Used to compute error bars.
    num_train_trials: int = 1

+    # Sample train examples or use deterministic


Make the comment clearer: Sample train examples (as opposed to taking the first few in order)

But just curious why the sample_train = False mode is needed? In general, it'd be nice not to add more things into AdapterSpec unless they are necessary.

Here, we are providing models bios in context for steerability. I disabled sampling so that I can ensure that we run through all the bios that are in-context examples, rather than just a randomly chosen subset.

Sampling is needed so that we can go through all the training examples (which are our steering groups) rather than sampling a random subset of them

src/helm/benchmark/presentation/run_specs_lm_opinions_ai21_steer-bio.conf

src/helm/benchmark/presentation/run_specs_opinions_qa_openai_default.conf

src/helm/benchmark/scenarios/opinions_qa_scenario.py

src/helm/benchmark/adaptation/adapter_spec.py

src/helm/benchmark/adaptation/adapters/in_context_learning_adapter.py

src/helm/benchmark/presentation/run_specs_opinions_qa_ai21_default.conf

src/helm/benchmark/presentation/run_specs_opinions_qa_ai21_steer.conf

src/helm/benchmark/presentation/run_specs_opinions_qa_openai_default.conf

src/helm/benchmark/presentation/run_specs_opinions_qa_openai_steer.conf

src/helm/benchmark/run_expander.py

src/helm/benchmark/run_specs.py

percyliang · 2023-03-26T21:21:31Z

src/helm/benchmark/run_specs.py

+    survey_type: str,
+    num_logprobs: str,
+    context: str = "None",
+    num_train_trials: str = "1",


I think you can get rid of this, because we have a RunExpander that allows you to set num_train_trials

src/helm/benchmark/scenarios/opinions_qa_scenario.py

percyliang

Thanks for the changes - I think this is almost ready - just a few more localized cleanups and documentation, so hopefully it won't be much additional work.

src/helm/benchmark/adaptation/adapter_spec.py

src/helm/benchmark/run_specs.py

percyliang

Great, thanks! (Left one more comment about the arguments of the run spec.)

ShibaniSanturkar added 5 commits February 13, 2023 21:15

Initial commit: LM Opinions scenario

832f9b2

Propagage num-outputs and tokens

ad40b31

OpinionsQA initial commit

8957d65

Opinions QA

4d82012

Minor

af959e4

ShibaniSanturkar requested review from percyliang and teetone March 20, 2023 00:17

ShibaniSanturkar added 3 commits March 19, 2023 17:37

Minor reformatting

1bd98e0

Minor formatting

d88e993

Minor

d856c49

percyliang reviewed Mar 20, 2023

View reviewed changes

src/helm/benchmark/presentation/run_specs_lm_opinions_ai21_steer-bio.conf Outdated Show resolved Hide resolved

percyliang reviewed Mar 20, 2023

View reviewed changes

src/helm/benchmark/presentation/run_specs_opinions_qa_openai_default.conf Outdated Show resolved Hide resolved

percyliang reviewed Mar 20, 2023

View reviewed changes

src/helm/benchmark/scenarios/opinions_qa_scenario.py Outdated Show resolved Hide resolved

percyliang reviewed Mar 20, 2023

View reviewed changes

src/helm/benchmark/scenarios/opinions_qa_scenario.py Outdated Show resolved Hide resolved

percyliang reviewed Mar 20, 2023

View reviewed changes

src/helm/benchmark/scenarios/opinions_qa_scenario.py Outdated Show resolved Hide resolved

percyliang reviewed Mar 20, 2023

View reviewed changes

src/helm/benchmark/scenarios/opinions_qa_scenario.py Outdated Show resolved Hide resolved

percyliang reviewed Mar 20, 2023

View reviewed changes

src/helm/benchmark/scenarios/opinions_qa_scenario.py Outdated Show resolved Hide resolved

ShibaniSanturkar added 7 commits March 25, 2023 12:45

Percy's comments

f29fd78

Address Percy's comments

1f688b5

Address Percy's comments

7ad8ad2

Minor

348474f

Minor

2b4cd2e

Minor

d50e512

Minor

e91e50e