VisionTransformer for Tensorflow2

This is an implementation of "VisionTransformer and DeiT" on Keras and Tensorflow.

The implementation is based on papers[1, 2] and official implementations[3, 4].

Model

Model (Distillation Token is optionally.)
- ViT-Small
- ViT-Base
- ViT-Large
- ViT-Huge
Pre-trained weight
- X

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby, https://arxiv.org/abs/2010.11929v2
Training data-efficient image transformers & distillation through attention, Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou, https://arxiv.org/abs/2012.12877v2
Vision Transformer and MLP-Mixer Architectures, google-research, https://github.com/google-research/vision_transformer
DeiT: Data-efficient Image Transformers, facebook-research, https://github.com/facebookresearch/deit

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
vit		vit
LICENSE		LICENSE
README.md		README.md
usage.ipynb		usage.ipynb