Skip to content

[NeurIPS 2024] official code release for our paper "Revisiting the Integration of Convolution and Attention for Vision Backbone".

License

Notifications You must be signed in to change notification settings

rayleizhu/GLMix

Repository files navigation

Official PyTorch implementation of GLNet, from the following paper:

Revisiting the Integration of Convolution and Attention for Vision Backbone. NeurIPS 2024.
Lei Zhu, Xinjiang Wang, Wayne Zhang, and Rynson Lau


Features of This Repository

This repository is designed to run experiments in an elegant and highly automated manner. The key features to achieve such a goal are:

  • Integration of hydra configuration system for efficient experiment management. See here.
  • Advanced support for slurm clusters. See slurm_wrapper.py.
  • Auto resume functionality to finish experiments in preemptive clusters. See here.
  • Integration of timm benchmark tools to get FLOPs and Throughputs automatically. See here.
  • Integration of Tensorboard for training visualization. See here
  • Integration of tools for IO optimization. See here.

Besides, you can find configurable visualization scripts tailored for the GLNet family in visualization/.

Quick Start

Setup environment and prepare data according to the installation guide, then execute the command below to evaluate our released checkpoints.

# It should finally report something like:
# * Acc@1 84.982 Acc@5 97.282 loss 0.774
# Accuracy of the model on 50000 test images: 84.98200%
python main_cx2.py \
    data_path=./data/in1k input_size=224  batch_size=128 dist_eval=true \
    eval=true model='glnet_16g' load_release=true \
    # +slurm=${CLUSTER_ID} slurm.nodes=2 slurm.ngpus=8
    # Add the commented line above if you are on a slurm cluster.
    # Make sure configs/slurm/{CLUSTER_ID}.yaml is created.

Detailed Guide

  • You don't need to download checkpoints manually; the script will download them automatically to ~/.cache/torch/hub/checkpoints/[MODEL_NAME].pth.
  • With regular training recipe main_cx2.py, availble models are glnet_4g, glnet_9g, glnet_16g, glnet_stl, glnet_stl_paramslot.
  • For token labeling models, use main_tklb.py, available models are glnet_4g_tklb, glnet_9g_tklb. These models have better performances for IN1k classification (see Tab. 3 in our paper).
  • For training, please remove the eval=True flag. To reproduce the reported results, ensure hyperparameters are correctly set (Tab. 8 & Tab. 9) as those in the paper (Appendix).
    • Generally, you can leave most hyperparameters default, with only drop_path=xx lr=2e-3 clip_grad=5.0 being set manually.
    • The drop_path for glnet_4g/glnet_9g/glnet_16g is 0.15/0.3/0.4. For stl (Swin-Tiny-Layout) models (glnet_stl/glnet_stl_paramslot) and token labeling models (glnet_4g_tklb/glnet_9g_tklb) we use 0.1.
    • We use a global batch size of 2048. The global batch size is batch_size * gpus_per_node * num_nodes * update_freq. If you do not have enough memory, try to set update_freq larger than 1 for gradient accumulation (we did not try it, though).

Model Card

name acc@1 #params FLOPs log
glnet_4g 83.7 27 M 4.5 G log
glnet_9g 84.5 60 M 9.7 G log
glnet_16g 85.0 106 M 16.7 G log
glnet_4g_tklb 84.4 27 M 4.5 G log
glnet_9g_tklb 85.3 61 M 9.7 G log
glnet_stl 82.5 30 M 4.4 G log
glnet_stl_paramslot 82.1 30 M 4.4 G log

TODOs

  • camera-ready paper link
  • IN1k standard training code, logs, and checkpoints
  • IN1k token-labeling code, logs, and checkpoint
  • visualization scripts
  • arXiv link
  • semantic segmentation and object detection code
  • a guide for running on multiple GPUs without Slurm
    • this should be simple with torchrun. Contributions are welcome.
  • add legacy models (e.g., BiFormer)
  • support Ascend NPU

Acknowledgement

This repository is built using the timm library, and BiFormer, ConvNext repositories.

License

This project is released under the MIT license. Please take a look at the LICENSE file for more information.

Citation

If you find this repository helpful, please consider citing:

@article{zhu2024glnet,
  title={Revisiting the Integration of Convolution and Attention for Vision Backbone},
  author={Zhu, Lei and Wang, Xinjiang and Zhang, Wayne and Lau, Rynson},
  journal={Advances in Neural Information Processing Systems},
  year={2024}
}

About

[NeurIPS 2024] official code release for our paper "Revisiting the Integration of Convolution and Attention for Vision Backbone".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published