Skip to content

Latest commit

 

History

History
24 lines (18 loc) · 1.36 KB

README.md

File metadata and controls

24 lines (18 loc) · 1.36 KB

Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts

This repository contains the code used in the Mixture-of-Supernet (MoS) work.

Folder/File Experiments
mos-mt/mos-mt/ Machine Translation
mos-bert/mos-bert/ BERT Pretraining

Citation

If you use this code, please cite:

@inproceedings{jawahar2024mos,
      title={Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts}, 
      author={Ganesh Jawahar and Haichuan Yang and Yunyang Xiong and Zechun Liu and Dilin Wang and Fei Sun and Meng Li and Aasish Pappu and Barlas Oguz and Muhammad Abdul-Mageed and Laks V. S. Lakshmanan and Raghuraman Krishnamoorthi and Vikas Chandra},
      year={2024},
      booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
}

License

This repository is GPL-licensed.