WideNet_Code

Implementation of AAAI 2022 Paper: Go wider instead of deeper arXiv

You can run our code in this way.

$ cd WideNet_Code
$ bash submit_tpu.sh

We trained our model on Google Cloud TPU v3. You can follow Google Cloud's document to setup the environment. For GPU users, our code would also be feasible by some small modifications. You can also reimplement our code directly by JAX or PyTorch. This implementation should be simple:

implementing one MoE layer, this can be supported by ViT-MoE in JAX or DeepSpeed MoE in Torch. You can certainly use other MoE implementation.
Share the weights of MoE layers and attention layers
Unshare the weights of Layer Norm.

If you have any question, please feel free to ping Fuzhao.

Citing WideNet

If you use WideNet, you can cite our paper. Here is an example BibTeX entry:

@inproceedings{xue2022go,
  title={Go wider instead of deeper},
  author={Xue, Fuzhao and Shi, Ziji and Wei, Futao and Lou, Yuxuan and Liu, Yong and You, Yang},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={36},
  number={8},
  pages={8779--8787},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
widenet		widenet
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WideNet_Code

Citing WideNet

About

Releases

Packages

Languages

XueFuzhao/WideNet_Code

Folders and files

Latest commit

History

Repository files navigation

WideNet_Code

Citing WideNet

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages