Skip to content

Official implementation of "DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning" in ICML'24

Notifications You must be signed in to change notification settings

guosyjlu/DS-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DS-Agent

This is the official implementation of our work "DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning" (ICML 2024). [arXiv Version] [Download Benchmark(Google Drive)]

overview.png

Benchmark and Dataset

We select 30 representative data science tasks covering three data modalities and two fundamental ML task types. Please download the datasets and corresponding configuration files via [Google Drive] here and unzip them to the directory of "development/benchmarks". Besides, we collect the human insight cases from Kaggle in development/data.zip. Please unzip it, too.

overview.png

Setup

This project is built on top of the framework of MLAgentBench. First, install MLAgentBench package with:

cd development
pip install -e.

Then, please install neccessary libraries in the requirements.

pip install -r requirements.txt

Since DS-Agent mainly utilizes GPT-3.5 and GPT-4 for all the experiments, please fill in the openai key in development/MLAgentBench/LLM.py and deployment/generate.py

Development Stage

Run DS-Agent for development tasks with the following command:

cd development/MLAgentBench
python runner.py --task feedbackv2 --llm-name gpt-3.5-turbo-16k --edit-script-llm-name gpt-3.5-turbo-16k

During execution, logs and intermediate solution files will be saved in logs/ and workspace/.

Deployment Stage

Run DS-Agent for deployment tasks with the provided command:

cd deployment
bash code_generation.sh
bash code_evaluation.sh

For open-sourced LLM, i.e., mixtral-8x7b-Instruct-v0.1 in this paper, we utilize the vllm framework. First, enable the LLMs serverd with

cd deployment
bash start_api.sh

Then, run the script shell and replace the configuration --llm by mixtral.

Cite

Please consider citing our paper if you find this work useful:


@InProceedings{DS-Agent,
  title = 	 {{DS}-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning},
  author =       {Guo, Siyuan and Deng, Cheng and Wen, Ying and Chen, Hechang and Chang, Yi and Wang, Jun},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {16813--16848},
  year = 	 {2024},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  publisher =    {PMLR}
}

About

Official implementation of "DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning" in ICML'24

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published