Adaptive Attention Span in Computer Vision

Official implementation of Adaptive Attention Span in Computer Vision.

In this work we first try replicating results from Stand-Alone Self-Attention in Vision Models.

Next we propose a novel method based on the Adaptive Attention Span for learning a local self attention kernel size. We compare this with Local Attention kernels as well as convolution kernels on CIFAR100. Our codes for Adaptive Attention Span in 2D is originally inspired from FAIR's implementation. Code for self-attention in convolutions is loosely based on this repo by leaderj1001.

Steps to replicate

Clone this repository
Get the requirements pip install -r requirements.txt

Execution notes:

Our Adaptive implementation takes 3, 6 and 11 hours for small, medium and large models respectively on 2 P100 GPUs for 100 epochs on CIFAR100.
Some important flags are,
- To run on GPU, use the flag --cuda True, otherwise do not use this option.
- Use flags --smallest_version True to run the smallest version. --small_version True to run the medium model and no flags to use the large model
- A description of each of the small, medium and large is given in Appendix A.3 of our paper
For more details on other flags, see the file config.py which has descriptions for each.

Snippets

Best performing medium adaptive attention span model on CIFAR100:

python main.py --all_attention True --eta_min 0 --warmup_epochs 10 \
--lr 0.05 --batch-size 50 --small_version True --cuda True \
--num-workers 2 --xpid best_adaptive_medium --groups 4 \
--attention_kernel 5 --epochs 100 --dataset CIFAR100 --weight-decay 0.0005 \
--adaptive_span True --R 2 --span_penalty 0.01

Best performing medium local attention model on CIFAR100:

python main.py --all_attention True --eta_min 0 --warmup_epochs 10 \
--lr 0.05 --batch-size 50 --small_version True --cuda True \
--num-workers 2 --xpid best_local_medium --groups 4 \
--attention_kernel 5 --epochs 100  --dataset CIFAR100 --weight-decay 0.0005

Best performing medium CNN model on CIFAR100:

python main.py --eta_min 0 --warmup_epochs 10 --lr 0.2 --batch-size 50 \
--small_version True --cuda True --num-workers 2 --T_max 100 --xpid best_cnn_medium \
--dataset CIFAR100 --force_cosine_annealing True --weight-decay 0.0001

Reference

If you find this repository useful, do cite it with

@misc{parker2020adaptive,
    title={Adaptive Attention Span in Computer Vision},
    author={Jerrod Parker and Shakti Kumar and Joe Roussy},
    year={2020},
    eprint={2004.08708},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
attention.py		attention.py
attention_augmented_conv.py		attention_augmented_conv.py
config.py		config.py
file_writer.py		file_writer.py
flop_count.py		flop_count.py
main.py		main.py
model.py		model.py
preprocess.py		preprocess.py
process_tiny_image_net.py		process_tiny_image_net.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Attention Span in Computer Vision

Steps to replicate

Snippets

Reference

About

Releases

Packages

Contributors 4

Languages

License

JoeRoussy/adaptive-attention-in-cv

Folders and files

Latest commit

History

Repository files navigation

Adaptive Attention Span in Computer Vision

Steps to replicate

Snippets

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages