Standardize folder format for benchmarks (Move benchmark-specific READMEs, util scripts, metrics, etc. to appropriate folders) #976

afourney · 2023-12-13T23:19:29Z

As we onboard more benchmarks and evaluation datasets, I think it is worth reorganizing the Testbed folder structure. A 1st pass proposed layout would be:

./testbed/
./testbed/README.md
./testbed/testbed  <-- Convert `run_scenarios.py` to a module and move it and other global stuff like `Includes` here
./testbed/scenarios <-- Keep as-is

Then for each Benchmark within scenarios, (using GAIA as an example):

./testbed/scenarios/GAIA/README.md         <-- Readme for running GAIA
./testbed/scenarios/GAIA/Templates             <-- As-is templates for the scenarios
./testbed/scenarios/GAIA/Scripts                  <-- GAIA related scripts (download, collate, etc.)
./testbed/scenarios/GAIA/Tasks                     <-- JSONL files describing the tasks (created dynamically when download script run)
./testbed/scenarios/GAIA/Results                  <-- Results folder produced by run_scenarios.py (created dynamically with run_scenarios)
./testbed/scenarios/GAIA/TaskResources      <-- Files referenced by tasks (created dynamically when download script is run)

This pattern would be repeated for HumanEval, MATH, etc.
Moving forward, we might also provide a Docker file and etc. perhaps in the root of each Benchmark as warranted.

@LeoLjl @kevin666aa You have worked with the Testbed. What do you think of this?

I am willing to do the legwork to make this happen. I am not asking you to modify your contributions, but am asking if it would be ok for me to do so,

The text was updated successfully, but these errors were encountered:

yiranwu0 · 2023-12-14T02:28:17Z

Where should on put requirements.txt for a dataset? I think it should also be put in corresponding scenarios.

./testbed/scenarios/GAIA/Scripts includes py files right? Just a personal preference, I would put these files in/testbed/scenarios/GAIA/ directly if there are not too many.

Another idea:
Remove TaskResources and add:

./testbed/scenarios/GAIA/Tasks/JSONL
./testbed/scenarios/GAIA/Tasks/Downloads

afourney · 2023-12-14T15:35:14Z

Yeah requirements will get moved too. I'm trying to decide the best place.

Technically if you put a requirements.txt file in a template folder, it will be copied over with that scenario. You can also even template it and have control per task instance. Either might be a better option going forward (I'm sort of learning as I go here)

afourney · 2023-12-22T22:13:46Z

Completed in #1048

* unify config to pyproject.toml replace flake8 with Ruff * drop configs * update * fixing * Apply suggestions from code review Co-authored-by: Zvi Baratz <z.baratz@gmail.com> * setup * ci * pr template * reword --------- Co-authored-by: Zvi Baratz <z.baratz@gmail.com> Co-authored-by: Li Jiang <lijiang1@microsoft.com>

afourney mentioned this issue Dec 13, 2023

[Meta-issue]: AutoGenBench Work Items #973

Closed

afourney added the autogenbench Issues related to AutoGenBench. label Dec 13, 2023

afourney assigned qingyun-wu, yiranwu0 and LeoLjl Dec 13, 2023

afourney mentioned this issue Dec 14, 2023

Add collate file and more tests from autogpt into testbed #915

Merged

3 tasks

afourney closed this as completed Dec 22, 2023

afourney mentioned this issue Jan 31, 2024

Initial PR: Add gitignore and improve template detection #1482

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize folder format for benchmarks (Move benchmark-specific READMEs, util scripts, metrics, etc. to appropriate folders) #976

Standardize folder format for benchmarks (Move benchmark-specific READMEs, util scripts, metrics, etc. to appropriate folders) #976

afourney commented Dec 13, 2023 •

edited by yiranwu0

Loading

yiranwu0 commented Dec 14, 2023

afourney commented Dec 14, 2023 •

edited

Loading

afourney commented Dec 22, 2023

Standardize folder format for benchmarks (Move benchmark-specific READMEs, util scripts, metrics, etc. to appropriate folders) #976

Standardize folder format for benchmarks (Move benchmark-specific READMEs, util scripts, metrics, etc. to appropriate folders) #976

Comments

afourney commented Dec 13, 2023 • edited by yiranwu0 Loading

yiranwu0 commented Dec 14, 2023

afourney commented Dec 14, 2023 • edited Loading

afourney commented Dec 22, 2023

afourney commented Dec 13, 2023 •

edited by yiranwu0

Loading

afourney commented Dec 14, 2023 •

edited

Loading