Skip to content

Latest commit

 

History

History
executable file
·
60 lines (45 loc) · 2.7 KB

README.md

File metadata and controls

executable file
·
60 lines (45 loc) · 2.7 KB

Finetuning Vision Transformers

Code for fine-tuning ViT models on various classification datasets.

Available Datasets

Dataset --data.dataset
CIFAR-10 cifar10
CIFAR-100 cifar100
Oxford-IIIT Pet Dataset pets37
Oxford Flowers-102 flowers102
Food-101 food101
STL-10 stl10
Describable Textures Dataset dtd
Stanford Cars cars
FGVC Aircraft aircraft
Image Folder custom

Requirements

  • Python 3.8+
  • pip install -r requirements.txt

Usage

Training

  • To fine-tune a ViT-B/16 model on CIFAR-100 run:
python train.py --accelerator gpu --devices 1 --precision 16 --max_steps 5000 --model.lr 0.01
--model.warmup_steps 500 --val_check_interval 250 --data.batch_size 128 --data.dataset cifar100
  • config/ contains example configuration files which can be run with:
python train.py --accelerator gpu --devices 1 --precision 16 --config path/to/config
  • To get a list of all arguments run python train.py --help

Evaluate

To evaluate a trained model on its test set run:

python test.py --accelerator gpu --devices 1 --precision 16 --checkpoint path/to/checkpoint
  • Note: Make sure the --precision argument is set to the same level as used during training.

Results

All results are from fine-tuned ViT-B/16 models which were pretrained on ImageNet-21k.

Dataset Total Steps Warm Up Steps Learning Rate Accuracy Config
CIFAR-10 5000 500 0.01 99.00 Link
CIFAR-100 5000 500 0.01 92.89 Link
Oxford Flowers-102 1000 100 0.03 99.02 Link
Oxford-IIIT Pets 2000 200 0.01 93.68 Link
Food-101 5000 500 0.03 90.67 Link