Implementation of AAAI 2022 Paper: Go wider instead of deeper arXiv
You can run our code in this way.
$ cd WideNet_Code
$ bash submit_tpu.sh
We trained our model on Google Cloud TPU v3. You can follow Google Cloud's document to setup the environment. For GPU users, our code would also be feasible by some small modifications. You can also reimplement our code directly by JAX or PyTorch. This implementation should be simple:
- implementing one MoE layer, this can be supported by ViT-MoE in JAX or DeepSpeed MoE in Torch. You can certainly use other MoE implementation.
- Share the weights of MoE layers and attention layers
- Unshare the weights of Layer Norm.
If you have any question, please feel free to ping Fuzhao.
If you use WideNet, you can cite our paper. Here is an example BibTeX entry:
@inproceedings{xue2022go,
title={Go wider instead of deeper},
author={Xue, Fuzhao and Shi, Ziji and Wei, Futao and Lou, Yuxuan and Liu, Yong and You, Yang},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={36},
number={8},
pages={8779--8787},
year={2022}
}