If you use this code in your research, please cite the following preprint:
@misc{zhang2024lemonlabelerrordetection,
title={LEMoN: Label Error Detection using Multimodal Neighbors},
author={Haoran Zhang and Aparna Balagopalan and Nassim Oufattole and Hyewon Jeong and Yan Wu and Jiacheng Zhu and Marzyeh Ghassemi},
year={2024},
eprint={2407.18941},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.18941},
}
Run the following commands to clone this repo and create the Conda environment:
git clone git@github.com:MLforHealth/LEMoN.git
cd LEMoN
conda env create -f environment.yml
conda activate lemon
CIFAR-10 and CIFAR-100 are downloaded automatically by the codebase. To preprocess the remaining datasets, follow the instructions in DataSources.md.
To run a single evaluation, call run_lemon.py
with the appropriate arguments, for example:
python -m run_lemon \
--output_dir /output/dir \
--dataset mscoco \
--noise_type cat \
--noise_level 0.4
To reproduce the experiments in the paper which involve training a grid of models using different hyperparameters, use sweep.py
as follows:
python sweep.py launch \
--experiment {experiment_name} \
--output_dir {output_root} \
--command_launcher {launcher}
where:
experiment_name
corresponds to experiments defined as classes inexperiments.py
output_root
is a directory where experimental results will be stored.launcher
is a string corresponding to a launcher defined inlaunchers.py
(i.e.slurm
orlocal
).
After the multimodal_knn_caption
experiment has finished running, to create Tables 2 and 3, run notebooks/agg_results.ipynb
and notebooks/hparam_drop.ipynb