LEMoN: Label Error Detection using Multimodal Neighbors

Paper

If you use this code in your research, please cite the following preprint:

@misc{zhang2024lemonlabelerrordetection,
      title={LEMoN: Label Error Detection using Multimodal Neighbors}, 
      author={Haoran Zhang and Aparna Balagopalan and Nassim Oufattole and Hyewon Jeong and Yan Wu and Jiacheng Zhu and Marzyeh Ghassemi},
      year={2024},
      eprint={2407.18941},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.18941}, 
}

To replicate the experiments in the paper:

Step 0: Environment and Prerequisites

Run the following commands to clone this repo and create the Conda environment:

git clone git@github.com:MLforHealth/LEMoN.git
cd LEMoN
conda env create -f environment.yml
conda activate lemon

Step 1: Preprocessing Data

CIFAR-10 and CIFAR-100 are downloaded automatically by the codebase. To preprocess the remaining datasets, follow the instructions in DataSources.md.

Step 2: Running Experiments

To run a single evaluation, call run_lemon.py with the appropriate arguments, for example:

python -m run_lemon \ 
    --output_dir /output/dir \
    --dataset mscoco \
    --noise_type cat \
    --noise_level 0.4

To reproduce the experiments in the paper which involve training a grid of models using different hyperparameters, use sweep.py as follows:

python sweep.py launch \
    --experiment {experiment_name} \
    --output_dir {output_root} \
    --command_launcher {launcher}

where:

experiment_name corresponds to experiments defined as classes in experiments.py
output_root is a directory where experimental results will be stored.
launcher is a string corresponding to a launcher defined in launchers.py (i.e. slurm or local).

Step 3: Aggregating Results

After the multimodal_knn_caption experiment has finished running, to create Tables 2 and 3, run notebooks/agg_results.ipynb and notebooks/hparam_drop.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
lib		lib
notebooks		notebooks
.gitignore		.gitignore
DataSources.md		DataSources.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
experiments.py		experiments.py
launchers.py		launchers.py
requirements.txt		requirements.txt
run_lemon.py		run_lemon.py
sweep.py		sweep.py
train_clip_from_scratch.py		train_clip_from_scratch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LEMoN: Label Error Detection using Multimodal Neighbors

Paper

To replicate the experiments in the paper:

Step 0: Environment and Prerequisites

Step 1: Preprocessing Data

Step 2: Running Experiments

Step 3: Aggregating Results

About

Releases

Packages

Languages

License

MLforHealth/LEMoN

Folders and files

Latest commit

History

Repository files navigation

LEMoN: Label Error Detection using Multimodal Neighbors

Paper

To replicate the experiments in the paper:

Step 0: Environment and Prerequisites

Step 1: Preprocessing Data

Step 2: Running Experiments

Step 3: Aggregating Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages