Homepage: https://acmsocc.org/2024/index.html
Paper list: https://acmsocc.org/2024/schedule.html
- LLM inference
- Queue Management for SLO-Oriented Large Language Model Serving [Paper]
- UIUC & IBM Research
- Queue Management for SLO-Oriented Large Language Model Serving [Paper]
- LLM training
- Distributed Training of Large Language Models on AWS Trainium [Paper]
- AWS
- Distributed Training of Large Language Models on AWS Trainium [Paper]
- MoE inference
- MoEsaic: Shared Mixture of Experts [Paper]
- IBM Research
- MoEsaic: Shared Mixture of Experts [Paper]
- KACE: Kernel-Aware Colocation for Efficient GPU Spatial Sharing [Paper]
- Stony Brook University
- On-demand and Parallel Checkpoint/Restore for GPU Applications [Paper]
- SJTU IPADS & Shanghai Artificial Intelligence Research Institute
- gCROP: GPU Checkpoint/Restore made On-demand and Parallel
- Scheduler for deep learning training workloads
- Hops: Fine-grained heterogeneous sensing, efficient and fair Deep Learning cluster scheduling system [Paper]
- Anhui University & Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
- Hops: Fine-grained heterogeneous sensing, efficient and fair Deep Learning cluster scheduling system [Paper]
- Generative Adversarial Networks (GANs)
- ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks [Paper]
- NUS
- ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks [Paper]