GLNet (NeurIPS 2024)

Official PyTorch implementation of GLNet, from the following paper:

Revisiting the Integration of Convolution and Attention for Vision Backbone. NeurIPS 2024.
Lei Zhu, Xinjiang Wang, Wayne Zhang, and Rynson Lau

Features of This Repository

This repository is designed to run experiments in an elegant and highly automated manner. The key features to achieve such a goal are:

Integration of hydra configuration system for efficient experiment management. See here.
Advanced support for slurm clusters. See slurm_wrapper.py.
Auto resume functionality to finish experiments in preemptive clusters. See here.
Integration of timm benchmark tools to get FLOPs and Throughputs automatically. See here.
Integration of Tensorboard for training visualization. See here
Integration of tools for IO optimization. See here.

Besides, you can find configurable visualization scripts tailored for the GLNet family in visualization/.

Quick Start

Setup environment and prepare data according to the installation guide, then execute the command below to evaluate our released checkpoints.

# It should finally report something like:
# * Acc@1 84.982 Acc@5 97.282 loss 0.774
# Accuracy of the model on 50000 test images: 84.98200%
python main_cx2.py \
    data_path=./data/in1k input_size=224  batch_size=128 dist_eval=true \
    eval=true model='glnet_16g' load_release=true \
    # +slurm=${CLUSTER_ID} slurm.nodes=2 slurm.ngpus=8
    # Add the commented line above if you are on a slurm cluster.
    # Make sure configs/slurm/{CLUSTER_ID}.yaml is created.

Detailed Guide

You don't need to download checkpoints manually; the script will download them automatically to ~/.cache/torch/hub/checkpoints/[MODEL_NAME].pth.
With regular training recipe main_cx2.py, availble models are glnet_4g, glnet_9g, glnet_16g, glnet_stl, glnet_stl_paramslot.
For token labeling models, use main_tklb.py, available models are glnet_4g_tklb, glnet_9g_tklb. These models have better performances for IN1k classification (see Tab. 3 in our paper).
For training, please remove the eval=True flag. To reproduce the reported results, ensure hyperparameters are correctly set (Tab. 8 & Tab. 9) as those in the paper (Appendix).
- Generally, you can leave most hyperparameters default, with only drop_path=xx lr=2e-3 clip_grad=5.0 being set manually.
- The drop_path for glnet_4g/glnet_9g/glnet_16g is 0.15/0.3/0.4. For stl (Swin-Tiny-Layout) models (glnet_stl/glnet_stl_paramslot) and token labeling models (glnet_4g_tklb/glnet_9g_tklb) we use 0.1.
- We use a global batch size of 2048. The global batch size is batch_size * gpus_per_node * num_nodes * update_freq. If you do not have enough memory, try to set update_freq larger than 1 for gradient accumulation (we did not try it, though).

Model Card

name	acc@1	#params	FLOPs	log
glnet_4g	83.7	27 M	4.5 G	log
glnet_9g	84.5	60 M	9.7 G	log
glnet_16g	85.0	106 M	16.7 G	log
glnet_4g_tklb	84.4	27 M	4.5 G	log
glnet_9g_tklb	85.3	61 M	9.7 G	log
glnet_stl	82.5	30 M	4.4 G	log
glnet_stl_paramslot	82.1	30 M	4.4 G	log

TODOs

camera-ready paper link
IN1k standard training code, logs, and checkpoints
IN1k token-labeling code, logs, and checkpoint
visualization scripts
arXiv link
semantic segmentation and object detection code
a guide for running on multiple GPUs without Slurm
- this should be simple with torchrun. Contributions are welcome.
add legacy models (e.g., BiFormer)
support Ascend NPU

Acknowledgement

This repository is built using the timm library, and BiFormer, ConvNext repositories.

License

This project is released under the MIT license. Please take a look at the LICENSE file for more information.

Citation

If you find this repository helpful, please consider citing:

@article{zhu2024glnet,
  title={Revisiting the Integration of Convolution and Attention for Vision Backbone},
  author={Zhu, Lei and Wang, Xinjiang and Zhang, Wayne and Lau, Rynson},
  journal={Advances in Neural Information Processing Systems},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ConvNextV2		ConvNextV2
assets		assets
configs		configs
models		models
scripts		scripts
token_labeling		token_labeling
visualization		visualization
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
datasets.py		datasets.py
environment.yaml		environment.yaml
folder2lmdb.py		folder2lmdb.py
losses.py		losses.py
main_cx2.py		main_cx2.py
main_tklb.py		main_tklb.py
prefetch_generator.py		prefetch_generator.py
samplers.py		samplers.py
slurm_wrapper.py		slurm_wrapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GLNet (NeurIPS 2024)

Features of This Repository

Quick Start

Detailed Guide

Model Card

TODOs

Acknowledgement

License

Citation

About

Releases

Packages

Languages

License

rayleizhu/GLMix

Folders and files

Latest commit

History

Repository files navigation

GLNet (NeurIPS 2024)

Features of This Repository

Quick Start

Detailed Guide

Model Card

TODOs

Acknowledgement

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages