Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts

This repository contains the code used in the Mixture-of-Supernet (MoS) work.

Folder/File	Experiments
`mos-mt/mos-mt/`	Machine Translation
`mos-bert/mos-bert/`	BERT Pretraining

Citation

If you use this code, please cite:

@inproceedings{jawahar2024mos,
      title={Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts}, 
      author={Ganesh Jawahar and Haichuan Yang and Yunyang Xiong and Zechun Liu and Dilin Wang and Fei Sun and Meng Li and Aasish Pappu and Barlas Oguz and Muhammad Abdul-Mageed and Laks V. S. Lakshmanan and Raghuraman Krishnamoorthi and Vikas Chandra},
      year={2024},
      booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
}

License

This repository is GPL-licensed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts

Citation

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts

Citation

License