AIDE: the Machine Learning CodeGen Agent

AIDE is an LLM agent that generates solutions for machine learning tasks just from natural language descriptions of the task. In a benchmark composed of over 60 Kaggle data science competitions, AIDE demonstrated impressive performance, surpassing 50% of Kaggle participants on average (see our technical report for details). More specifically, AIDE has the following features:

Instruct with Natural Language: Describe your problem or additional requirements and expert insights, all in natural language.
Deliver Solution in Source Code: AIDE will generate Python scripts for the tested machine learning pipeline. Enjoy full transparency, reproducibility, and the freedom to further improve the source code!
Iterative Optimization: AIDE iteratively runs, debugs, evaluates, and improves the ML code, all by itself.
Visualization: We also provide tools to visualize the solution tree produced by AIDE for a better understanding of its experimentation process. This gives you insights not only about what works but also what doesn't.

How to use AIDE?

Setup

Make sure you have Python>=3.10 installed and run:

pip install -U aideml

Also install unzip to allow the agent to autonomously extract your data.

Set up your OpenAI (or Anthropic) API key:

export OPENAI_API_KEY=<your API key>
# or
export ANTHROPIC_API_KEY=<your API key>

Running AIDE via the command line

To run AIDE:

aide data_dir="<path to your data directory>" goal="<describe the agent's goal for your task>" eval="<(optional) describe the evaluation metric the agent should use>"

For example, to run AIDE on the example house price prediction task:

aide data_dir="example_tasks/house_prices" goal="Predict the sales price for each house" eval="Use the RMSE metric between the logarithm of the predicted and observed values."

Options:

data_dir (required): a directory containing all the data relevant for your task (.csv files, images, etc.).
goal: describe what you want the models to predict in your task, for example, "Build a timeseries forcasting model for bitcoin close price" or "Predict sales price for houses".
eval: the evaluation metric used to evaluate the ML models for the task (e.g., accuracy, F1, Root-Mean-Squared-Error, etc.)

Alternatively, you can provide the entire task description as a desc_str string, or write it in a plaintext file and pass its path as desc_file (example file).

aide data_dir="my_data_dir" desc_file="my_task_description.txt"

The result of the run will be stored in the logs directory.

logs/<experiment-id>/best_solution.py: Python code of best solution according to the validation metric
logs/<experiment-id>/journal.json: a JSON file containing the metadata of the experiment runs, including all the code generated in intermediate steps, plan, evaluation results, etc.
logs/<experiment-id>/tree_plot.html: you can open it in your browser. It contains visualization of solution tree, which details the experimentation process of finding and optimizing ML code. You can explore and interact with the tree visualization to view what plan and code AIDE comes up with in each step.

The workspaces directory will contain all the files and data that the agent generated.

Advanced Usage

To further customize the behaviour of AIDE, some useful options might be:

agent.code.model=... to configure which model the agent should use for coding (default is gpt-4-turbo)
agent.steps=... to configure how many improvement iterations the agent should run (default is 20)
agent.search.num_drafts=... to configure the number of initial drafts the agent should generate (default is 5)

You can check the config.yaml file for more options.

Using AIDE in Python

Using AIDE within your Python script/project is easy. Follow the setup steps above, and then create an AIDE experiment like below and start running:

import aide
exp = aide.Experiment(
    data_dir="example_tasks/bitcoin_price",  # replace this with your own directory
    goal="Build a timeseries forcasting model for bitcoin close price.",  # replace with your own goal description
    eval="RMSLE"  # replace with your own evaluation metric
)

best_solution = exp.run(steps=10)

print(f"Best solution has validation metric: {best_solution.valid_metric}")
print(f"Best solution code: {best_solution.code}")

Development

To install AIDE for development, clone this repository and install it locally.

git clone https://github.com/WecoAI/aideml.git
cd aideml
pip install -e .

Contribution guide will be available soon.

Algorithm Description

AIDE's problem-solving approach is inspired by how human data scientists tackle challenges. It starts by generating a set of initial solution drafts and then iteratively refines and improves them based on performance feedback. This process is driven by a technique we call Solution Space Tree Search.

At its core, Solution Space Tree Search consists of three main components:

Solution Generator: This component proposes new solutions by either creating novel drafts or making changes to existing solutions, such as fixing bugs or introducing improvements.
Evaluator: The evaluator assesses the quality of each proposed solution by running it and comparing its performance against the objective. This is implemented by instructing the LLM to include statements that print the evaluation metric and by having another LLM parse the printed logs to extract the evaluation metric.
Base Solution Selector: The solution selector picks the most promising solution from the explored options to serve as the starting point for the next iteration of refinement.

By repeatedly applying these steps, AIDE navigates the vast space of possible solutions, progressively refining its approach until it converges on the optimal solution for the given data science problem.

Solution Gallery

Domain	Task	Top%	Solution Link	Competition Link
Urban Planning	Forecast city bikeshare system usage	5%	link	link
Physics	Predicting Critical Heat Flux	56%	link	link
Genomics	Classify bacteria species from genomic data	0%	link	link
Agriculture	Predict blueberry yield	58%	link	link
Healthcare	Predict disease prognosis	0%	link	link
Economics	Predict monthly microbusiness density in a given area	35%	link	link
Cryptography	Decrypt shakespearean text	91%	link	link
Data Science Education	Predict passenger survival on Titanic	78%	link	link
Software Engineering	Predict defects in c programs given various attributes about the code	0%	link	link
Real Estate	Predict the final price of homes	5%	link	link
Real Estate	Predict house sale price	36%	link	link
Entertainment Analytics	Predict movie worldwide box office revenue	62%	link	link
Entertainment Analytics	Predict scoring probability in next 10 seconds of a rocket league match	21%	link	link
Environmental Science	Predict air pollution levels	12%	link	link
Environmental Science	Classify forest categories using cartographic variables	55%	link	link
Computer Vision	Predict the probability of machine failure	32%	link	link
Computer Vision	Identify handwritten digits	14%	link	link
Manufacturing	Predict missing values in dataset	70%	link	link
Manufacturing	Predict product failures	48%	link	link
Manufacturing	Cluster control data into different control states	96%	link	link
Natural Language Processing	Classify toxic online comments	78%	link	link
Natural Language Processing	Predict passenger transport to an alternate dimension	59%	link	link
Natural Language Processing	Classify sentence sentiment	42%	link	link
Natural Language Processing	Predict whether a tweet is about a real disaster	48%	link	link
Business Analytics	Predict total sales for each product and store in the next month	87%	link	link
Business Analytics	Predict book sales for 2021	66%	link	link
Business Analytics	Predict insurance claim amount	80%	link	link
Business Analytics	Minimize penalty cost in scheduling families to santa's workshop	100%	link	link
Business Analytics	Predict yearly sales for learning modules	26%	link	link
Business Analytics	Binary classification of manufacturing machine state	60%	link	link
Business Analytics	Forecast retail store sales	36%	link	link
Business Analytics	Predict reservation cancellation	54%	link	link
Finance	Predict the probability of an insurance claim	13%	link	link
Finance	Predict loan loss	0%	link	link
Finance	Predict a continuous target	42%	link	link
Finance	Predict customer churn	24%	link	link
Finance	Predict median house value	58%	link	link
Finance	Predict closing price movements for nasdaq listed stocks	99%	link	link
Finance	Predict taxi fare	100%	link	link
Finance	Predict insurance claim probability	62%	link	link
Biotech	Predict cat in dat	66%	link	link
Biotech	Predict the biological response of molecules	62%	link	link
Biotech	Predict medical conditions	92%	link	link
Biotech	Predict wine quality	61%	link	link
Biotech	Predict binary target without overfitting	98%	link	link
Biotech	Predict concrete strength	86%	link	link
Biotech	Predict crab age	46%	link	link
Biotech	Predict enzyme characteristics	10%	link	link
Biotech	Classify activity state from sensor data	51%	link	link
Biotech	Predict horse health outcomes	86%	link	link
Biotech	Predict the mohs hardness of a mineral	64%	link	link
Biotech	Predict cirrhosis patient outcomes	51%	link	link
Biotech	Predict obesity risk	62%	link	link
Biotech	Classify presence of feature in data	66%	link	link
Biotech	Predict patient's smoking status	40%	link	link

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.github/workflows		.github/workflows
aide		aide
sample_results		sample_results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIDE: the Machine Learning CodeGen Agent

How to use AIDE?

Setup

Running AIDE via the command line

Advanced Usage

Using AIDE in Python

Development

Algorithm Description

Solution Gallery

About

Releases

Packages

Languages

License

leon-openai/aideml

Folders and files

Latest commit

History

Repository files navigation

AIDE: the Machine Learning CodeGen Agent

How to use AIDE?

Setup

Running AIDE via the command line

Advanced Usage

Using AIDE in Python

Development

Algorithm Description

Solution Gallery

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages