AutoKaggle

This is the formal repo for paper: "AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions"

Introduction

AutoKaggle is a powerful framework that assists data scientists in completing data science pipelines through a collaborative multi-agent system. The framework combines iterative development, comprehensive testing, and a machine learning tools library to automate Kaggle competitions while maintaining high customizability. The key features of AutoKaggle include:

Multi-agent Collaboration: Five specialized agents (Reader, Planner, Developer, Reviewer, and Summarizer) work together through six key competition phases.
Iterative Development and Unit Testing: Robust code verification through debugging and comprehensive unit testing.
ML Tools Library: Validated functions for data cleaning, feature engineering, and modeling.
Comprehensive Reporting: Detailed documentation of workflow and decision-making processes.

Quick Start with AutoKaggle

Set Environment

Clone the repository

git clone https://github.com/multimodal-art-projection/AutoKaggle.git

Create and activate conda environment

conda create -n AutoKaggle python=3.11
conda activate AutoKaggle

Install dependencies

pip install -r requirements.txt

Configure OpenAI API Create api_key.txt with:

sk-xxx                           # Your API key
https://api.openai.com/v1       # Base URL

Data Preparation

We support evaluation of Tabular-type datasets from Kaggle. Please Place competition data in ./multi_agents/competition/ with the following structure:

competition/
├── train.csv
├── test.csv
├── sample_submission.csv
└── overview.txt                 # Competition overview and data description

overview.txt: Copy and paste the Overview and Data sections from the Kaggle competition homepage into this file. The Reader will read this file to summarize relevant information.

Running AutoKaggle

To run AutoKaggle experiments, use the following command:

bash run_multi_agent.sh

Configuration Parameters

Competition Selection
- competitions: Define target competitions in the script
Experiment Control
- start_run, end_run: Define experiment iterations (default: 1-5)
- dest_dir_param: Output directory specification (default: "all_tools")
Model Configuration
- Default: gpt-4o for Planner and Developer, gpt-4o-mini for other agents
- model determines the base model of Planner
- Modify _create_agent in multi_agents/sop.py to change the base model of other agents

Output Structure

multi_agents/experiments_history/
└── <competition>/
    └── <model>/
        └── <dest_dir_param>/
            └── <run_number>/

Result

We evaluated AutoKaggle across 8 diverse Kaggle competitions, achieving:

85% validation submission rate
0.82 comprehensive score

Citation

[Citation information to be added]

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
mdPICs		mdPICs
multi_agents		multi_agents
LICENSE.md		LICENSE.md
README.md		README.md
api_handler.py		api_handler.py
api_key.txt		api_key.txt
framework.py		framework.py
requirements.txt		requirements.txt
run_multi_agents.sh		run_multi_agents.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoKaggle

Introduction

Quick Start with AutoKaggle

Set Environment

Data Preparation

Running AutoKaggle

Configuration Parameters

Output Structure

Result

Citation

License

About

Releases

Packages

Languages

License

multimodal-art-projection/AutoKaggle

Folders and files

Latest commit

History

Repository files navigation

AutoKaggle

Introduction

Quick Start with AutoKaggle

Set Environment

Data Preparation

Running AutoKaggle

Configuration Parameters

Output Structure

Result

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages