- Ansor: Generating High-Performance Tensor Programs for Deep Learning [arxiv]
- Lianmin Zheng, UC Berkeley; Chengfan Jia, Minmin Sun, and Zhao Wu, Alibaba Inc.; Cody Hao Yu, Amazon Web Services, Inc; Ameer Haj-Ali, UC Berkeley; Yida Wang, Amazon Web Services, Inc; Jun Yang, Alibaba Inc.; Danyang Zhuo, Duke University and UC Berkeley; Koushik Sen, Joseph Gonzalez, and Ion Stoica, UC Berkeley
- A Tensor Compiler Approach for One-size-fits-all ML Prediction Serving [arxiv] [code]
- Supun Nakandala, University of California San Diego; Karla Saur, Microsoft; Gyeong-In Yu, Seoul National University; Konstantinos Karanasos and Carlo Curino, Microsoft; Markus Weimer, Microsoft Research; Matteo Interlandi, Microsoft
- Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks
- Lingxiao Ma, Peking University and Microsoft Research; Zhiqiang Xie, ShanghaiTech University and Microsoft Research; Zhi Yang, Peking University; Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, and Lidong Zhou, Microsoft Research
- Serving DNNs like Clockwork: Performance Predictability from the Bottom Up [arxiv]
- Arpan Gujarati, Max Planck Institute for Software Systems; Reza Karimi, Emory University; Safya Alzayat and Antoine Kaufmann, Max Planck Institute for Software Systems; Ymir Vigfusson, Emory University; Jonathan Mace, Max Planck Institute for Software Systems
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
- Zhihao Bai and Zhen Zhang, Johns Hopkins University; Yibo Zhu, ByteDance Inc.; Xin Jin, Johns Hopkins University
- HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees [Code]
- Hanyu Zhao, Peking University; Zhenhua Han, The University of Hong Kong; Zhi Yang, Peking University; Quanlu Zhang, Fan Yang, Lidong Zhou, and Mao Yang, Microsoft Research; Francis C.M. Lau, The University of Hong Kong; Yuqi Wang, Yifan Xiong, and Bin Wang, Microsoft
- Retiarii: A Deep Learning Exploratory-Training Framework
- Quanlu Zhang, Zhenhua Han, Fan Yang, Yuge Zhang, Zhe Liu, Mao Yang, and Lidong Zhou, Microsoft Research Asia
- Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads [arxiv]
- Deepak Narayanan, Keshav Santhanam, and Fiodar Kazhamiaka, Stanford University; Amar Phanishayee, Microsoft Research; Matei Zaharia, Stanford University
- KungFu: Making Training in Distributed Machine Learning Adaptive
- Luo Mai, Guo Li, Marcel Wagenlander, Konstantinos Fertakis, Andrei-Octavian Brabete, and Peter Pietzuch, Imperial College London
- A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters
- Yimin Jiang, Tsinghua University; Yibo Zhu, ByteDance Inc.; Chang Lan, Google; Bairen Yi, ByteDance Inc.; Yong Cui, Tsinghua University; Chuanxiong Guo, ByteDance Inc.
- AntMan: Dynamic Scaling on GPU Cluster for Deep Learning
- Wencong Xiao, Shiru Ren, Yong Li, Yang zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, and Yangqing Jia, Alibaba Group