GITBOOK-201: Organize the papers of SC '24

mental2008 · Dec 17, 2024 · 53e74fd · 53e74fd
1 parent 0f1282d
commit 53e74fd
Show file tree

Hide file tree

Showing 4 changed files with 104 additions and 9 deletions.
diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@ Specifically, I have a broad interest in systems (e.g., OSDI, SOSP, NSDI, ATC, E
 
 ## Changelogs
 
-* 12/2024: Briefly organize the papers of [EuroSys 2025](reading-notes/conference/eurosys-2025.md) (only Spring cycle); organize the papers of [SoCC 2024](reading-notes/conference/socc-2024.md).
+* 12/2024: Briefly organize the papers of [EuroSys 2025](reading-notes/conference/eurosys-2025.md) (only Spring cycle); organize the papers of [SoCC 2024](reading-notes/conference/socc-2024.md), [SC 2024](reading-notes/conference/sc-2024.md).
 * 09/2024: Organize the papers of [SOSP 2024](reading-notes/conference/sosp-2024.md).
 * 08/2024: Organize the papers of [VLDB 2024](reading-notes/conference/vldb-2024.md); update the reading notes of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md); create new paper lists of [diffusion models](paper-list/artificial-intelligence/diffusion-models.md), [language models](paper-list/artificial-intelligence/language-models.md), and [deep learning recommendation models](paper-list/artificial-intelligence/dlrm.md).
 * 07/2024: Organize the papers of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md), [ICML 2024](reading-notes/conference/icml-2024.md), [ATC 2024](reading-notes/conference/atc-2024.md), [OSDI 2024](reading-notes/conference/osdi-2024.md), [NSDI 2024](reading-notes/conference/nsdi-2024.md), [CVPR 2024](reading-notes/conference/cvpr-2024.md), [ISCA 2024](reading-notes/conference/isca-2024.md); create a new paper list of [systems for diffusion models](paper-list/systems-for-ml/diffusion-models.md); update the paper list of [systems for LLMs](paper-list/systems-for-ml/llm.md), [systems for DLRMs](paper-list/systems-for-ml/dlrm.md), and [resource scheduler](paper-list/systems-for-ml/resource-scheduler.md).

diff --git a/reading-notes/conference/README.md b/reading-notes/conference/README.md
@@ -13,7 +13,7 @@
 |            Conference           |        When        | Where                                                  |                    Remarks                    |
 | :-----------------------------: | :----------------: | ------------------------------------------------------ | :-------------------------------------------: |
 |    [SoCC 2024](socc-2024.md)    |   Nov 22-24, 2024  | Seattle, Washington, USA                               |                       🧐                      |
-|      [SC 2024](sc-2024.md)      |   Nov 17-22, 2024  | Atlanta, GA, USA                                       |                      WIP                      |
+|      [SC 2024](sc-2024.md)      |   Nov 17-22, 2024  | Atlanta, GA, USA                                       |                       🧐                      |
 |    [SOSP 2024](sosp-2024.md)    |    Nov 4-6, 2024   | Hilton Austin, Texas, USA                              |                       🧐                      |
 |    [VLDB 2024](vldb-2024.md)    |   Aug 26-30, 2024  | Guangzhou, China                                       |                       🧐                      |
 | [SIGCOMM 2024](sigcomm-2024.md) |    Aug 4-8, 2024   | Sydney, Australia                                      |                       🧐                      |

diff --git a/reading-notes/conference/eurosys-2025.md b/reading-notes/conference/eurosys-2025.md
@@ -19,7 +19,7 @@ Paper list: [https://2025.eurosys.org/accepted-papers.html](https://2025.eurosys
     * CUHK-Shenzhen & UChicago & Stanford
   * T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
     * USTC & MSRA
-* RLHF
+* LLM fine-tuning
   * HybridFlow: A Flexible and Efficient RLHF Framework
     * HKU & ByteDance
 
@@ -54,6 +54,5 @@ Paper list: [https://2025.eurosys.org/accepted-papers.html](https://2025.eurosys
 
 ## Acronyms
 
-RLHF: Reinforcement Learning from Human Feedback
-
-ML: Machine Learning
+* RLHF: Reinforcement Learning from Human Feedback
+* ML: Machine Learning
diff --git a/reading-notes/conference/sc-2024.md b/reading-notes/conference/sc-2024.md
@@ -8,8 +8,104 @@ Paper list: [https://dl.acm.org/doi/proceedings/10.5555/3703596](https://dl.acm.
 
 ## Papers
 
+### AI Infrastructure
+
+* Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00089)] \[[HAI Platform Code](https://github.com/HFAiLab/hai-platform)]
+  * DeepSeek AI
+  * Include Network Co-Design, HFReduce (collective communication library), HaiScale (optimized parallelism methods), 3FS Distributed File System, and HAI Platform (task scheduling, fault tolerance).
+
+### Large Language Models (LLMs)
+
+* LLM inference
+  * PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00046)] \[[Code](https://github.com/AutonomicPerfectionist/PipeInfer)]
+    * Iowa State University & TU Darmstadt
+    * _Continuous Asynchronous Speculation_: run single-token inference simultaneously with several speculative runs.
+    * _Early Inference Cancellation_: skip the computation of invalidated runs.
+  * LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00022)] \[[Benchmark](https://github.com/fmperf-project/fmperf)] \[[Code](https://github.com/IBM/LLM-performance-prediction)]
+    * IBM Research
+    * Learn a predictive model to recommend the most cost-effective hardware for a previously unseen LLM.
+* LLM fine-tuning
+  * Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00081)] \[[Code](https://github.com/HPHEX/LongExposure)]
+    * MSRA & THU
+* LLM for anomaly detection
+  * Large Language Models for Anomaly Detection in Computational Workflows: From Supervised Fine-Tuning to In-Context Learning \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00098)] \[[Code](https://github.com/PoSeiDon-Workflows/LLM_AD)] \[[Benchmark](https://github.com/PoSeiDon-Workflows/FlowBench)]
+    * Argonne National Laboratory & USC & Oak Ridge National Laboratory
+    * Investigated two approaches: (1) supervised fine-tuning (pre-trained LLMs are fine-tuned on labeled data for sentence classification to identify anomalies); (2) in-context learning (prompts containing task descriptions and examples guide LLMs in few-shot anomaly detection without fine-tuning).
+
+### Mixture-of-Experts (MoEs)
+
+* APTMoE: Affinity-Aware Pipeline Tuning for MoE Models on Bandwidth-Constrained GPU Nodes \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00096)] \[[Code](https://github.com/Atopos-309/APTMoE)]
+  * SYSU
+
 ### Deep Learning Recommendation Models (DLRMs)
 
-* Accelerating Distributed DLRM Training with Optimized TT Decomposition and Micro-Batching \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00055)]
-  * Wuhan University & NVIDIA & University of Macau
-  * **EcoRec**
+* Accelerating Distributed DLRM Training with Optimized TT Decomposition and Micro-Batching \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00055)] \[[Code](https://doi.org/10.5281/zenodo.13324403)]
+  * WHU & NVIDIA & UMacau
+  * **EcoRec:** eliminate redundancy in TT (Tensor-Train) operations; micro-batching with sorted indices to reduce memory.
+* Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00095)] \[[Code](https://zenodo.org/records/13119689)]
+  * Indiana University, Bloomington & Meta & University of Rochester & ICT, CAS
+  * In-depth analysis of embedding data features; employ error-bounded lossy compression to reduce the communication data size.
+* Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00100)] \[[Code](https://github.com/luckyq/ADSC-24)]
+  * UC Merced & SK Hynix
+  * **TECO**: Tensor-CXL-Offload
+  * Introduce a cache coherence interconnect based on CXL to build a cache coherence domain between CPU memory and accelerator memory; offload tensors to CPU memory to save accelerator memory.
+* RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules \[[Paper](https://dl.acm.org/doi/pdf/10.1109/SC41406.2024.00047)] \[[Code](https://github.com/PanZaifeng/RecFlex)]
+  * RUC & Microsoft & UCSD
+  * Create fused kernels with distinct schedules for _different_ feature fields.
+
+### Graph Transformer
+
+* TorchGT: A Holistic System for Large-Scale Graph Transformer Training \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00083)] \[[Code](https://github.com/zxmeng98/torchgt)]
+  * NTU & Shanghai AI Lab & ZJU & SenseTime
+
+### **Reinforcement Learning (RL)**
+
+* Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00045)] \[[Code](https://github.com/IntelliSys-Lab/Stellaris-SC24)]
+  * Stevens Institute of Technology & NEU & Stony Brook University & Missouri University of Science and Technology
+  * Introduce a generic asynchronous learning paradigm.
+
+### Job Scheduling
+
+* PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00032)] \[[Code](https://github.com/hal-uw/blox-pal)]
+  * UW-Madison
+  * Characterize which applications are more likely to suffer from performance variability; balance performance variability with locality to ensure jobs are spread across as few nodes as possible.
+
+### Distributed Training
+
+* ,Optimizing Distributed ML Communication with Fused Computation-Collective Operations \[[Paper](https://dl.acm.org/doi/pdf/10.1109/SC41406.2024.00094)]
+  * AMD
+  * Developed three prototype fused operators (embedding + All-to-All, GEMV + AllReduce, and GEMM + All-to-All) to address the communication overheads in DLRM, Transformers and MoE model architectures.
+
+### Serverless Computing
+
+* SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00044)] \[[Code](https://github.com/blinkbear/smiless-ad)]
+  * SIAT, CAS & UMacau
+  * Integrate adaptive pre-warming windows; built on top of OpenFaaS.
+
+### GPU Sharing
+
+* ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00048)] \[[Code](https://github.com/MunQ-Lee/ParvaGPU_SC24)]
+  * Chung-Ang University & Electronics and Telecommunications Research Institute & Virginia Tech
+  * Integrate MIG and MPS to enhance GPU utilization.
+
+### Performance Analysis
+
+* GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00063)] \[[Code](https://zenodo.org/records/10975567)]
+  * Beihang University
+  * Employ _static analysis_ to identify the performance-critical parameters of kernel functions; segment the program execution with external library calls and asynchronous kernel operations; construct a state transfer graph and estimate the workload of each program segment.
+
+### Interconnects
+
+* Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00039)] \[[Benchmark](https://zenodo.org/records/13312325)]
+  * Sapienza University of Rome & University of Trento & Vrije Universiteit Amsterdam & ETH & CINECA & University of Antwerp & HPE & NVIDIA
+  * Characterize three supercomputers: Alps, Leonardo, and LUMI.
+
+## Acronyms
+
+* LLM: Large Language Model
+* MoE: Mixture-of-Experts
+* DLRM: Deep Learning Recommendation Model
+* PEFT: Parameter-Efficient Fine-Tuning
+* MIG: Multi-Instance GPU
+* MPS: Multi-Process Service
+* CXL: Compute Express Link