Set up the ability to run eval suites #114

polm-stability · 2023-11-06T10:51:34Z

This PR includes changes to allow the running of eval suites with a single command. An example command looks like this:

python scripts/run_suite.py my_model my_eval_suite my_prompt

The suite is specified as a list of tasks, with versions and fewshot specs, in a config file. Because the spec is in a file, it can be versioned and shared across models, while each model can vary the prompt it uses (as well as args related to loading the model). Prompts are specified using names rather than numbers to make it clear what they refer to and avoid mistakes.

This is a barely functional wrapper for running "test suites", which are just a list of preconfigured tasks. You can specify prompt and model. This needs more testing and UI cleanup.

This moves suite config handling code into the library proper instead of the script, and creates a subdir for suite configs.

polm-stability · 2023-11-09T08:20:06Z

This is still pretty bare-bones, but it's functional and should be good for automating eval across different models. Basically we can run the same eval we've been running, but with a simpler invocation, and without worrying about copying versions or fewshot parameters the wrong way.

polm-stability · 2023-11-10T07:34:28Z

lm_eval/prompts.py

+PROMPT_CODES = {
+    "user": "0.0",
+    "jgpt": "0.1",
+    "fintan": "0.2",
+    "fintan2": "0.2.1",
+    "ja-alpaca": "0.3",
+    "rinna-sft": "0.4",
+    "rinna-bilingual": "0.5",
+    "llama2": "0.6",
+}


Let me know if these names make sense or could be improved.

mkshing

@polm-stability thank you for this PR!

Can you add the instruction to use suites somewhere? Maybe you can just change the example script in README to one using suites.
Can you fix docs/prompt_templates.md?

polm-stability · 2023-11-10T07:57:00Z

Good point, docs should be updated now.

mkshing · 2023-11-10T07:59:05Z

Awesome! LGTM!

mrorii

Generally LGTM 👍, but let me double check one point just in case 🙏

docs/prompt_templates.md

scripts/run_suite.py

This introduces a style for handling complex prompts and specifically handles the case of JSLM Beta. This is handled by using a function that takes the name of task as input. This allows for full customization without requiring details specification when actually running an eval suite. The style is simple - instead of mapping to a numeric version like 0.2, a shortname for a prompt can map to a callable that takes the task name. This allows for any kind of custom logic. This may not be the simplest or best approach, but it required few changes, keeps everything in one place, and touches nothing else in the code base, so it should be easy to change later if necessary.

mrorii

LGTM, thanks! 👍

polm added 7 commits November 6, 2023 19:43

First steps to running suites

c6b7bcb

This is a barely functional wrapper for running "test suites", which are just a list of preconfigured tasks. You can specify prompt and model. This needs more testing and UI cleanup.

Lint fix

7cbe6cf

Fix task versions

353be9b

Add limit

319dddf

Move prompt codes to main lib

d194a90

Remove debugging output

30ae24c

Rearrange files

760b1d4

This moves suite config handling code into the library proper instead of the script, and creates a subdir for suite configs.

polm-stability requested review from mrorii and mkshing November 9, 2023 08:18

polm-stability marked this pull request as ready for review November 9, 2023 08:18

polm-stability requested a review from jon-tow as a code owner November 9, 2023 08:18

polm-stability removed the request for review from jon-tow November 9, 2023 08:18

polm-stability commented Nov 10, 2023

View reviewed changes

mkshing reviewed Nov 10, 2023

View reviewed changes

Update README, prompt docs

a233a8c

mkshing approved these changes Nov 10, 2023

View reviewed changes

mrorii reviewed Nov 10, 2023

View reviewed changes

docs/prompt_templates.md Outdated Show resolved Hide resolved

scripts/run_suite.py Outdated Show resolved Hide resolved

polm added 2 commits November 13, 2023 12:18

Fix link

7b0b3ee

polm-stability requested a review from mrorii November 14, 2023 07:49

mrorii approved these changes Nov 20, 2023

View reviewed changes

polm-stability merged commit e68527f into Stability-AI:jp-stable Nov 20, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set up the ability to run eval suites #114

Set up the ability to run eval suites #114

polm-stability commented Nov 6, 2023

polm-stability commented Nov 9, 2023

polm-stability Nov 10, 2023

mkshing left a comment

polm-stability commented Nov 10, 2023

mkshing commented Nov 10, 2023

mrorii left a comment

mrorii left a comment

Set up the ability to run eval suites #114

Set up the ability to run eval suites #114

Conversation

polm-stability commented Nov 6, 2023

polm-stability commented Nov 9, 2023

polm-stability Nov 10, 2023

Choose a reason for hiding this comment

mkshing left a comment

Choose a reason for hiding this comment

polm-stability commented Nov 10, 2023

mkshing commented Nov 10, 2023

mrorii left a comment

Choose a reason for hiding this comment

mrorii left a comment

Choose a reason for hiding this comment