Skip to content

Implementation of AAAI 2022 Paper: Go wider instead of deeper

Notifications You must be signed in to change notification settings

XueFuzhao/WideNet_Code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

WideNet_Code

Implementation of AAAI 2022 Paper: Go wider instead of deeper arXiv

You can run our code in this way.

$ cd WideNet_Code
$ bash submit_tpu.sh

We trained our model on Google Cloud TPU v3. You can follow Google Cloud's document to setup the environment. For GPU users, our code would also be feasible by some small modifications. You can also reimplement our code directly by JAX or PyTorch. This implementation should be simple:

  1. implementing one MoE layer, this can be supported by ViT-MoE in JAX or DeepSpeed MoE in Torch. You can certainly use other MoE implementation.
  2. Share the weights of MoE layers and attention layers
  3. Unshare the weights of Layer Norm.

If you have any question, please feel free to ping Fuzhao.

Citing WideNet

If you use WideNet, you can cite our paper. Here is an example BibTeX entry:

@inproceedings{xue2022go,
  title={Go wider instead of deeper},
  author={Xue, Fuzhao and Shi, Ziji and Wei, Futao and Lou, Yuxuan and Liu, Yong and You, Yang},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={36},
  number={8},
  pages={8779--8787},
  year={2022}
}

About

Implementation of AAAI 2022 Paper: Go wider instead of deeper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published