Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize folder format for benchmarks (Move benchmark-specific READMEs, util scripts, metrics, etc. to appropriate folders) #976

Closed
Tracked by #973
afourney opened this issue Dec 13, 2023 · 3 comments
Assignees
Labels
autogenbench Issues related to AutoGenBench.

Comments

@afourney
Copy link
Member

afourney commented Dec 13, 2023

As we onboard more benchmarks and evaluation datasets, I think it is worth reorganizing the Testbed folder structure. A 1st pass proposed layout would be:

./testbed/
./testbed/README.md
./testbed/testbed  <-- Convert `run_scenarios.py` to a module and move it and other global stuff like `Includes` here
./testbed/scenarios <-- Keep as-is

Then for each Benchmark within scenarios, (using GAIA as an example):

./testbed/scenarios/GAIA/README.md         <-- Readme for running GAIA
./testbed/scenarios/GAIA/Templates             <-- As-is templates for the scenarios
./testbed/scenarios/GAIA/Scripts                  <-- GAIA related scripts (download, collate, etc.)
./testbed/scenarios/GAIA/Tasks                     <-- JSONL files describing the tasks (created dynamically when download script run)
./testbed/scenarios/GAIA/Results                  <-- Results folder produced by run_scenarios.py (created dynamically with run_scenarios)
./testbed/scenarios/GAIA/TaskResources      <-- Files referenced by tasks (created dynamically when download script is run)

This pattern would be repeated for HumanEval, MATH, etc.
Moving forward, we might also provide a Docker file and etc. perhaps in the root of each Benchmark as warranted.

@LeoLjl @kevin666aa You have worked with the Testbed. What do you think of this?

I am willing to do the legwork to make this happen. I am not asking you to modify your contributions, but am asking if it would be ok for me to do so,

@yiranwu0
Copy link
Collaborator

Where should on put requirements.txt for a dataset? I think it should also be put in corresponding scenarios.

./testbed/scenarios/GAIA/Scripts includes py files right? Just a personal preference, I would put these files in/testbed/scenarios/GAIA/ directly if there are not too many.

Another idea:
Remove TaskResources and add:

./testbed/scenarios/GAIA/Tasks/JSONL
./testbed/scenarios/GAIA/Tasks/Downloads

@afourney
Copy link
Member Author

afourney commented Dec 14, 2023

Yeah requirements will get moved too. I'm trying to decide the best place.

Technically if you put a requirements.txt file in a template folder, it will be copied over with that scenario. You can also even template it and have control per task instance. Either might be a better option going forward (I'm sort of learning as I go here)

@afourney
Copy link
Member Author

Completed in #1048

whiskyboy pushed a commit to whiskyboy/autogen that referenced this issue Apr 17, 2024
* unify config to pyproject.toml
replace flake8 with Ruff

* drop configs

* update

* fixing

* Apply suggestions from code review

Co-authored-by: Zvi Baratz <z.baratz@gmail.com>

* setup

* ci

* pr template

* reword

---------

Co-authored-by: Zvi Baratz <z.baratz@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autogenbench Issues related to AutoGenBench.
Projects
None yet
Development

No branches or pull requests

4 participants