Skip to content

ViTs-Mixup-CIFAR100-Weights

Latest
Compare
Choose a tag to compare
@Lupin1998 Lupin1998 released this 18 Jul 21:57
· 36 commits to main since this release

A collection of weights and logs for image classification experiments with modern Transformer architectures on CIFAR-100. These benchmarks are proposed for the convenience of conducting research in Mixup augmentations with Transformers since the most published benchmarks of Mixup variants with ViTs are based on ImageNet-1K. Please refer to our tech report for more details.

  • Since the original resolutions of CIFAR-100 are too small for ViTs, we resize the input images to $224\times 224$ (training and testing) while not modifying the ViT architectures. This benchmark uses the DeiT setup and trains the model for 200 or 600 epochs with a batch size of 100 on CIFAR-100. The basic learning rates of DeiT and Swin are $1e-3$ and $5e-4$, which is the optimal setup in our experiments. We search and report $\alpha$ in $Beta(\alpha, \alpha)$ for all compared methods. View config files in mixups/vits.
  • The best of top-1 accuracy in the last 10 training epochs is reported for ViT architectures. We released the trained models and logs in vits-mix-cifar100-weights.

ViTs' Mixup Benchmark on CIFAR-100

Backbones $Beta$ DEiT-S(/16) DEiT-S(/16) Swin-T Swin-T
Epoch $\alpha$ 200 epochs 600 epochs 200 epochs 600 epochs
Vanilla - 65.81 68.50 78.41 81.29
MixUp 0.8 69.98 76.35 76.78 83.67
CutMix 2 74.12 79.54 80.64 83.38
DeiT 0.8,1 75.92 79.38 81.25 84.41
SmoothMix 0.2 67.54 80.25 66.69 81.18
SaliencyMix 0.2 69.78 76.60 80.40 82.58
AttentiveMix+ 2 75.98 80.33 81.13 83.69
FMix* 1 70.41 74.31 80.72 82.82
GridMix 1 68.86 74.96 78.54 80.79
PuzzleMix 2 73.60 81.01 80.44 84.74
ResizeMix* 1 68.45 71.95 80.16 82.36
AlignMix 1 - - 78.91 83.34
TransMix 0.8,1 76.17 79.33 81.33 84.45
AutoMix 2 76.24 80.91 82.67 84.70
SAMix* 2 77.94 82.49 82.62 84.85