QKFormer: Hierarchical Spiking Transformer using Q-K Attention (NeurIPS 2024)

QKFormer achieves a groundbreaking top-1 accuracy of 85.65% on ImageNet-1k, the first time directly training SNNs have exceeded 85% accuracy on ImageNet-1K.

News

[2024.10.10] Update code and trained models.

[2024.09.25] Accepted as a spotlight in NeurIPS 2024.

Abstact

Spiking Transformers, which integrate Spiking Neural Networks (SNNs) with Transformer architectures, have attracted significant attention due to their potential for low energy consumption and high performance. However, there remains a substantial gap in performance between SNNs and Artificial Neural Networks (ANNs). To narrow this gap, we have developed QKFormer, a direct training spiking transformer with the following features: i) Linear complexity and high energy efficiency, the novel spike-form Q-K attention module efficiently models the token or channel attention through binary vectors and enables the construction of larger models. ii) Multi-scale spiking representation, achieved by a hierarchical structure with the different number of tokens across blocks. iii) Spiking Patch Embedding with Deformed Shortcut (SPEDS), enhances spiking information transmission and integration, thus improving overall performance. %Together, we develop QKFormer, a hierarchical spiking transformer based on Q-K attention with direct training. It is shown that QKFormer achieves significantly superior performance over existing state-of-the-art SNN models on various mainstream datasets. Notably, with comparable size to Spikformer (66.34 M, 74.81%), QKFormer (64.96 M) achieves a groundbreaking top-1 accuracy of 85.65% on ImageNet-1k, substantially outperforming Spikformer by 10.84%.

Main results on ImageNet-1K

Model	Type	Architecture	Resolution	T	Param.	Top-1 Acc (%)	Download
ViT	ANN	ViT-B/16	384x384	-	85.9M	77.9	-
Deit	ANN	DeiT-B	384x384	-	86.0M	83.1	-
Swin transformer	ANN	Swin Transformer-B	384x384	-	88.0M	84.5	-
SEW-ResNet	SNN	SEW-ResNet-152	224x224	4	60.19M	69.26	-
Spikformer	SNN	Spikformer-8-768	224x224	4	66.34M	74.81	-
Spikingformer	SNN	Spikingformer-8-768	224x224	4	66.34M	75.85	-
QKFormer	SNN	HST-10-384	224x224	4	16.47M	78.80	link
QKFormer	SNN	HST-10-512	224x224	4	29.08M	82.04	link
QKFormer	SNN	HST-10-768	224x224	4	64.96M	84.22	link
QKFormer	SNN	HST-10-768	288x288	4	64.96M	85.25	link
QKFormer	SNN	HST-10-768	384x384	4	64.96M	85.65	link

All download passwords: abcd

Requirements

timm==0.6.12
cupy==11.4.0
torch==1.12.1
spikingjelly==0.0.0.0.12
pyyaml
tensorboard

data prepare: ImageNet with the following folder structure, you can extract imagenet by this script.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Train & Test

Training on ImageNet

cd imagenet
python -m torch.distributed.launch --nproc_per_node=8 train.py

Testing ImageNet Val data

Download the trained model first, then:

cd imagenet
python test.py

Training on CIFAR10

Setting hyper-parameters in cifar10.yml

cd cifar10
python train.py

Training on CIFAR100

Setting hyper-parameters in cifar100.yml

cd cifar10
python train.py

Training on DVS128 Gesture

cd dvs128-gesture
python train.py

Training on CIFAR10-DVS

cd cifar10-dvs
python train.py

Reference

If you find this repo useful, please consider citing:

@article{zhou2024qkformer,
  title={QKFormer: Hierarchical Spiking Transformer using QK Attention},
  author={Zhou, Chenlin and Zhang, Han and Zhou, Zhaokun and Yu, Liutao and Huang, Liwei and Fan, Xiaopeng and Yuan, Li and Ma, Zhengyu and Zhou, Huihui and Tian, Yonghong},
  journal={arXiv preprint arXiv:2403.16552},
  year={2024}
}

Acknowledgement & Contact Information

Related project: spikformer, spikingformer, spikingjelly.

For help or issues using this git, please submit a GitHub issue.

For other communications related to this git, please contact zhouchl@pcl.ac.cn or zhouchenlin19@mails.ucas.ac.cn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QKFormer: Hierarchical Spiking Transformer using Q-K Attention (NeurIPS 2024)

News

Abstact

Main results on ImageNet-1K

Requirements

Train & Test

Training on ImageNet

Testing ImageNet Val data

Training on CIFAR10

Training on CIFAR100

Training on DVS128 Gesture

Training on CIFAR10-DVS

Reference

Acknowledgement & Contact Information

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
cifar10-dvs		cifar10-dvs
cifar10		cifar10
cifar100		cifar100
dvs128-gesture		dvs128-gesture
imagenet		imagenet
imgs		imgs
README.md		README.md

zhouchenlin2096/QKFormer

Folders and files

Latest commit

History

Repository files navigation

QKFormer: Hierarchical Spiking Transformer using Q-K Attention (NeurIPS 2024)

News

Abstact

Main results on ImageNet-1K

Requirements

Train & Test

Training on ImageNet

Testing ImageNet Val data

Training on CIFAR10

Training on CIFAR100

Training on DVS128 Gesture

Training on CIFAR10-DVS

Reference

Acknowledgement & Contact Information

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages