ViT using Pytorch

This repo is me trying to explore the Vision Transformer (ViT) model from google, using Pytorch.

The original paper can be found here.

Usage

modules.py: Contains the implementation of the different modules/sub-networks used in the ViT model.
main.py: Contains the code to test the model against the official implementation using timm module
test.py: Contains the code to test the model with a sample image using the Coco weights, after running main.py
inspect.py: Contains the code to inspect the model, layer by layer, after running main.py

Run python main.py to load the official model to the one we built. This will save the model weights in the ./data folder.
Run python test.py to test the model with a sample image. This will print the top 5 predictions, I'm using a cat image feel free to change it.
Run python inspect.py to inspect the model, layer by layer. This will print the output of each layer.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
.gitignore		.gitignore
Readme.md		Readme.md
inspect_model.py		inspect_model.py
main.py		main.py
modules.py		modules.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
utility.py		utility.py