Skip to content

Latest commit

 

History

History
47 lines (31 loc) · 1.74 KB

socc-2024.md

File metadata and controls

47 lines (31 loc) · 1.74 KB

SoCC 2024

Meta Info

Homepage: https://acmsocc.org/2024/index.html

Paper list: https://acmsocc.org/2024/schedule.html

Papers

Large Language Models (LLMs)

  • LLM inference
    • Queue Management for SLO-Oriented Large Language Model Serving [Paper]
      • UIUC & IBM Research
  • LLM training
    • Distributed Training of Large Language Models on AWS Trainium [Paper]
      • AWS

Mixture of Experts (MoEs)

  • MoE inference
    • MoEsaic: Shared Mixture of Experts [Paper]
      • IBM Research

GPU Sharing

  • KACE: Kernel-Aware Colocation for Efficient GPU Spatial Sharing [Paper]
    • Stony Brook University

Serverless Computing

  • On-demand and Parallel Checkpoint/Restore for GPU Applications [Paper]
    • SJTU IPADS & Shanghai Artificial Intelligence Research Institute
    • gCROP: GPU Checkpoint/Restore made On-demand and Parallel

Resource Scheduler

  • Scheduler for deep learning training workloads
    • Hops: Fine-grained heterogeneous sensing, efficient and fair Deep Learning cluster scheduling system [Paper]
      • Anhui University & Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

Distributed Training

  • Generative Adversarial Networks (GANs)
    • ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks [Paper]
      • NUS