Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sela readme intro #1556

Merged
merged 6 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ share/python-wheels/
MANIFEST
metagpt/tools/schemas/
examples/data/search_kb/*.json
metagpt/ext/sela/AutogluonModels

# PyInstaller
# Usually these files are written by a python scripts from a template
Expand Down
25 changes: 23 additions & 2 deletions metagpt/ext/sela/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
# SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning


Official implementation for paper [SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning](https://arxiv.org/abs/2410.17238).


SELA is an innovative system that enhances Automated Machine Learning (AutoML) by integrating Monte Carlo Tree Search (MCTS) with LLM-based agents. Traditional AutoML methods often generate low-diversity and suboptimal code, limiting their effectiveness in model selection and ensembling. SELA addresses these challenges by representing pipeline configurations as trees, enabling agents to intelligently explore the solution space and iteratively refine their strategies based on experimental feedback.

## 1. Data Preparation

You can either download the datasets from the link or prepare the datasets from scratch.
- **Download Datasets:** [Dataset Link](https://deepwisdom.feishu.cn/drive/folder/RVyofv9cvlvtxKdddt2cyn3BnTc?from=from_copylink)
- **Download Datasets:** [Dataset Link](https://drive.google.com/drive/folders/151FIZoLygkRfeJgSI9fNMiLsixh1mK0r?usp=sharing)
- **Download and prepare datasets from scratch:**
```bash
cd data
Expand Down Expand Up @@ -82,4 +88,19 @@ pip install -r requirements.txt
- **Use a set of insights:**
```bash
python run_experiment.py --exp_mode rs --task titanic --rs_mode set
```
```

## 4. Citation
Please cite our paper if you use SELA or find it cool or useful!

```bibtex
@misc{chi2024selatreesearchenhancedllm,
title={SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning},
author={Yizhou Chi and Yizhang Lin and Sirui Hong and Duyi Pan and Yaying Fei and Guanghao Mei and Bangbang Liu and Tianqi Pang and Jacky Kwok and Ceyao Zhang and Bang Liu and Chenglin Wu},
year={2024},
eprint={2410.17238},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2410.17238},
}
```
32 changes: 1 addition & 31 deletions metagpt/ext/sela/runner/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,34 +165,4 @@ python run_experiment.py --exp_mode base --task titanic --num_experiments 10
To run additional baselines:

- Each baseline must produce `dev_predictions.csv` and `test_predictions.csv` with a `target` column.
- Use the `evaluate_score` function for evaluation.

---

## MLE-Bench

**Note:** MLE-Bench requires Python 3.11 or higher.

#### Setup

Clone the repository and install:

```bash
git clone https://github.com/openai/mle-bench.git
cd mle-bench
pip install -e .
```

Prepare the data:

```bash
mlebench prepare -c <competition-id> --data-dir <dataset-dir-save-path>
```

#### Run the MLE-Bench Experiment

Run the following command to execute the experiment:

```bash
python run_experiment.py --exp_mode mcts --custom_dataset_dir <dataset-dir-save-path/prepared/public> --rollouts 10 --from_scratch --role_timeout 3600
```
- Use the `evaluate_score` function for evaluation.
Loading