Lupin1998
released this
18 Jul 21:57
·
36 commits
to main
since this release
A collection of weights and logs for image classification experiments with modern Transformer architectures on CIFAR-100. These benchmarks are proposed for the convenience of conducting research in Mixup augmentations with Transformers since the most published benchmarks of Mixup variants with ViTs are based on ImageNet-1K. Please refer to our tech report for more details.
- Since the original resolutions of CIFAR-100 are too small for ViTs, we resize the input images to
$224\times 224$ (training and testing) while not modifying the ViT architectures. This benchmark uses the DeiT setup and trains the model for 200 or 600 epochs with a batch size of 100 on CIFAR-100. The basic learning rates of DeiT and Swin are$1e-3$ and$5e-4$ , which is the optimal setup in our experiments. We search and report$\alpha$ in$Beta(\alpha, \alpha)$ for all compared methods. View config files in mixups/vits. - The best of top-1 accuracy in the last 10 training epochs is reported for ViT architectures. We released the trained models and logs in vits-mix-cifar100-weights.
ViTs' Mixup Benchmark on CIFAR-100
Backbones | DEiT-S(/16) | DEiT-S(/16) | Swin-T | Swin-T | |
---|---|---|---|---|---|
Epoch | 200 epochs | 600 epochs | 200 epochs | 600 epochs | |
Vanilla | - | 65.81 | 68.50 | 78.41 | 81.29 |
MixUp | 0.8 | 69.98 | 76.35 | 76.78 | 83.67 |
CutMix | 2 | 74.12 | 79.54 | 80.64 | 83.38 |
DeiT | 0.8,1 | 75.92 | 79.38 | 81.25 | 84.41 |
SmoothMix | 0.2 | 67.54 | 80.25 | 66.69 | 81.18 |
SaliencyMix | 0.2 | 69.78 | 76.60 | 80.40 | 82.58 |
AttentiveMix+ | 2 | 75.98 | 80.33 | 81.13 | 83.69 |
FMix* | 1 | 70.41 | 74.31 | 80.72 | 82.82 |
GridMix | 1 | 68.86 | 74.96 | 78.54 | 80.79 |
PuzzleMix | 2 | 73.60 | 81.01 | 80.44 | 84.74 |
ResizeMix* | 1 | 68.45 | 71.95 | 80.16 | 82.36 |
AlignMix | 1 | - | - | 78.91 | 83.34 |
TransMix | 0.8,1 | 76.17 | 79.33 | 81.33 | 84.45 |
AutoMix | 2 | 76.24 | 80.91 | 82.67 | 84.70 |
SAMix* | 2 | 77.94 | 82.49 | 82.62 | 84.85 |