Skip to content
/ ECCO Public

[EMNLP 2024] Code for the paper "ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?"

Notifications You must be signed in to change notification settings

CodeEff/ECCO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ECCO

This repository contains the source code for the paper "ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?"

teaser

Dataset

The dataset is available on Huggingface at: CodeEff/ECCO.

It consists of 2 subsets edit and generate each with 3 splits (train, val and test).

Loading the dataset

dataset = load_dataset('CodeEff/ECCO', 'edit') # For history-based editing setting
dataset = load_dataset('CodeEff/ECCO', 'generate') # For nl-instructed generation setting

Download the test cases

mkdir data && cd data
wget https://huggingface.co/datasets/CodeEff/ECCO/resolve/main/test_cases.zip
unzip test_cases.zip

Experiments

Environment setup

conda env create -f environment.yml
conda activate ecco

Code structure

  1. evaluation consists of scripts to run evaluation of model generated code on the Judge0 environment server hosted on AWS. Please see instructions to setup the evaluation server.
    • edit_eval.py is the script for evaluating code generated on the metrics for the history-based editing setting
    • generate_eval.py is the script for evaluating code generated on the metrics for the NL-instructed generation setting
  2. experiments consists of the scripts to run modelling experiment.
    • model_classes.py consists of the Inference Engine Classes for each model that is benchmarked.
    • inference.py is the entrypoint for running the experiments
    • prompt_formats.py and utils.py cotains utilities for prompt building and execution feedback formatting

Starting up the evaluation setup

Judge Setup

Setup the evaluation setup with the guide in the evaluation README

Running experiments / Generating Code

We run experiments to generate code from the experiments/inference.py entrypoint. An example is provided below:

python experiments/inference.py --model deepseek \
   --temperature 0.4 --num_samples 1 --eval_mode "edit" 

Model choices are in the registry

--eval_mode choices are ['edit', 'nl2code', 'self-refine', 'exec-refine','nl2code-self-refine', 'nl-exec-refine', 'nl2code-exec-refine', 'nl2code-nl-exec-refine'] for the different experiments. Modes without the prefix nl2code correspond to the history-based editing setting and with the prefix refer to the NL-instructed generation paradigm.

Citation

@article{waghjale2024ecco,
  title={ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?},
  author={Waghjale, Siddhant and Veerendranath, Vishruth and Wang, Zora Zhiruo and Fried, Daniel},
  journal={arXiv preprint arXiv:2407.14044},
  year={2024}
}

About

[EMNLP 2024] Code for the paper "ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages