Vision Transformer (ViT)

Implementation of Vision Transformer (ViT) in Pytorch. ViT is presented in the paper, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Implementation

The ViT code for this repo is based on the book "Vision Transformer 入門" written in Japanese. I added some code for dataset preparation and training procedures using CIFAR10.

Usage

python run.py [-h] [-s SEED] FILE

positional arguments:
  FILE                  path to config file

options:
  -h, --help            show this help message and exit
  -s SEED, --seed SEED  seed for initializing training

Example

python run.py examples/CIFAR10/config.ini

Config

Here shows a list of settings and what they mean. Parameters are based on the ViT experiment conducted by GMO.

[dataset]
dir = ./datasets	; training data save directory
name = CIFAR10		; dataset name, only CIFAR10 is acceptable
in_channels = 3		; number of channels
image_size = 32		; image size; 32x32
num_classes = 10	; 10 class classification

[dataloader]
batch_size = 32
shuffle = true

[model]
patch_size = 4		; use 4 x 4 px for patch
embed_dim = 256		; same meaning of dim=256 of `vit-pytorch`
num_blocks = 3		; same meaning of depth=3 of `vit-pytorch`
heads = 4			; number of multihead attention
hidden_dim = 256	; same meaning of mlp_dim=256 of `vit-pytorch`
dropout = 0.1		; dropout ratio

[learning]
epochs = 20
learning_rate = 0.001

Result

ViT is inherently accurate when pre-trained on large image data sets (like JFT-300M), so simply training on CIFAR10, as in this code, does not reduce cross-entropy loss.

[2022-09-23 11:52:17] :vision_transformer.utils.logger: [INFO] loss: 2.0047439576718755
[2022-09-23 11:52:38] :vision_transformer.utils.logger: [INFO] loss: 1.8455862294370755
...
[2022-09-23 11:58:37] :vision_transformer.utils.logger: [INFO] loss: 1.2203882005268012
[2022-09-23 11:58:58] :vision_transformer.utils.logger: [INFO] loss: 1.2218489825915986

This same has been shown in GMO experiment.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
datasets		datasets
examples/CIFAR10		examples/CIFAR10
vision_transformer		vision_transformer
.envrc		.envrc
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
pysen.toml		pysen.toml
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Transformer (ViT)

Implementation

Usage

Example

Config

Result

About

Releases

Packages

Languages

License

topazape/ViT-Pytorch

Folders and files

Latest commit

History

Repository files navigation

Vision Transformer (ViT)

Implementation

Usage

Example

Config

Result

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages