Official PyTorch implementation for Dropout Reduces Underfitting
Dropout Reduces Underfitting, ICML 2023
Zhuang Liu*, Zhiqiu Xu*, Joseph Jin, Zhiqiang Shen, Trevor Darrell (* equal contribution)
Meta AI, UC Berkeley and MBZUAI
Figure: We propose early dropout and late dropout. Early dropout helps underfitting models fit the data better and achieve lower training loss. Late dropout helps improve the generalization performance of overfitting models.
Model weights are released as links on results.
results with basic recipe (s.d. = stochastic depth)
model | ViT-T | Mixer-S | Swin-F | ConvNeXt-F |
---|---|---|---|---|
no dropout | 73.9 | 71.0 | 74.3 | 76.1 |
standard dropout | 67.9 | 67.1 | 71.6 | - |
standard s.d. | 72.6 | 70.5 | 73.7 | 75.5 |
early dropout | 74.3 | 71.3 | 74.7 | - |
early s.d. | 74.4 | 71.7 | 75.2 | 76.3 |
results with improved recipe
model | ViT-T | Swin-F | ConvNeXt-F |
---|---|---|---|
no dropout | 76.3 | 76.1 | 77.5 |
standard dropout | 71.5 | 73.5 | - |
standard s.d. | 75.6 | 75.6 | 77.4 |
early dropout | 76.7 | 76.6 | - |
early s.d. | 76.7 | 76.6 | 77.7 |
results with basic recipe
model | ViT-B | Mixer-B |
---|---|---|
standard s.d. | 81.6 | 78.0 |
late s.d. | 82.3 | 78.6 |
Please check INSTALL.md for installation instructions.
We list commands for early dropout, early stochastic depth on ViT-T
and late stochastic depth on ViT-B
.
- For training other models, change
--model
accordingly, e.g., tovit_tiny
,mixer_s32
,convnext_femto
,mixer_b16
,vit_base
. - Our results were produced with 4 nodes, each with 8 gpus. Below we give example commands on both multi-node and single-machine setups.
Early dropout
multi-node
python run_with_submitit.py --nodes 4 --ngpus 8 \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 1 \
--dropout 0.1 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
single-machine
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--dropout 0.1 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Early stochastic depth
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--drop_path 0.5 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Late stochastic depth
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_base --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--drop_path 0.4 --drop_mode late --drop_schedule constant --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Standard dropout / no dropout (replace $p with 0.1 / 0.0)
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--dropout $p --drop_mode standard \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Our improved recipe extends training epochs from 300
to 600
, and reduces both mixup
and cutmix
to 0.3
.
Early dropout
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 600 --mixup 0.3 --cutmix 0.3 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--dropout 0.1 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Early stochastic depth
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 600 --mixup 0.3 --cutmix 0.3 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--drop_path 0.5 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
single-GPU
python main.py --model vit_tiny --eval true \
--resume /path/to/model \
--data_path /path/to/data
multi-GPU
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --eval true \
--resume /path/to/model \
--data_path /path/to/data
This repository is built using the timm library and ConvNeXt codebase.
This project is released under the CC-BY-NC 4.0 license. Please see the LICENSE file for more information.
If you find this repository helpful, please consider citing:
@inproceedings{liu2023dropout,
title={Dropout Reduces Underfitting},
author={Zhuang Liu, Zhiqiu Xu, Joseph Jin, Zhiqiang Shen, Trevor Darrell},
year={2023},
booktitle={International Conference on Machine Learning},
}