diff --git a/README.md b/README.md index 2d4bdbd..51d8139 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ Specifically, I have a broad interest in systems (e.g., OSDI, SOSP, NSDI, ATC, E ## Changelogs -* 12/2024: Briefly organize the papers of [EuroSys 2025](reading-notes/conference/eurosys-2025.md) (only Spring cycle); organize the papers of [SoCC 2024](reading-notes/conference/socc-2024.md). +* 12/2024: Briefly organize the papers of [EuroSys 2025](reading-notes/conference/eurosys-2025.md) (only Spring cycle); organize the papers of [SoCC 2024](reading-notes/conference/socc-2024.md), [SC 2024](reading-notes/conference/sc-2024.md). * 09/2024: Organize the papers of [SOSP 2024](reading-notes/conference/sosp-2024.md). * 08/2024: Organize the papers of [VLDB 2024](reading-notes/conference/vldb-2024.md); update the reading notes of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md); create new paper lists of [diffusion models](paper-list/artificial-intelligence/diffusion-models.md), [language models](paper-list/artificial-intelligence/language-models.md), and [deep learning recommendation models](paper-list/artificial-intelligence/dlrm.md). * 07/2024: Organize the papers of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md), [ICML 2024](reading-notes/conference/icml-2024.md), [ATC 2024](reading-notes/conference/atc-2024.md), [OSDI 2024](reading-notes/conference/osdi-2024.md), [NSDI 2024](reading-notes/conference/nsdi-2024.md), [CVPR 2024](reading-notes/conference/cvpr-2024.md), [ISCA 2024](reading-notes/conference/isca-2024.md); create a new paper list of [systems for diffusion models](paper-list/systems-for-ml/diffusion-models.md); update the paper list of [systems for LLMs](paper-list/systems-for-ml/llm.md), [systems for DLRMs](paper-list/systems-for-ml/dlrm.md), and [resource scheduler](paper-list/systems-for-ml/resource-scheduler.md). diff --git a/reading-notes/conference/README.md b/reading-notes/conference/README.md index f03aed4..5dfd3e5 100644 --- a/reading-notes/conference/README.md +++ b/reading-notes/conference/README.md @@ -13,7 +13,7 @@ | Conference | When | Where | Remarks | | :-----------------------------: | :----------------: | ------------------------------------------------------ | :-------------------------------------------: | | [SoCC 2024](socc-2024.md) | Nov 22-24, 2024 | Seattle, Washington, USA | 🧐 | -| [SC 2024](sc-2024.md) | Nov 17-22, 2024 | Atlanta, GA, USA | WIP | +| [SC 2024](sc-2024.md) | Nov 17-22, 2024 | Atlanta, GA, USA | 🧐 | | [SOSP 2024](sosp-2024.md) | Nov 4-6, 2024 | Hilton Austin, Texas, USA | 🧐 | | [VLDB 2024](vldb-2024.md) | Aug 26-30, 2024 | Guangzhou, China | 🧐 | | [SIGCOMM 2024](sigcomm-2024.md) | Aug 4-8, 2024 | Sydney, Australia | 🧐 | diff --git a/reading-notes/conference/eurosys-2025.md b/reading-notes/conference/eurosys-2025.md index 2c27c05..9def3f3 100644 --- a/reading-notes/conference/eurosys-2025.md +++ b/reading-notes/conference/eurosys-2025.md @@ -19,7 +19,7 @@ Paper list: [https://2025.eurosys.org/accepted-papers.html](https://2025.eurosys * CUHK-Shenzhen & UChicago & Stanford * T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge * USTC & MSRA -* RLHF +* LLM fine-tuning * HybridFlow: A Flexible and Efficient RLHF Framework * HKU & ByteDance @@ -54,6 +54,5 @@ Paper list: [https://2025.eurosys.org/accepted-papers.html](https://2025.eurosys ## Acronyms -RLHF: Reinforcement Learning from Human Feedback - -ML: Machine Learning +* RLHF: Reinforcement Learning from Human Feedback +* ML: Machine Learning diff --git a/reading-notes/conference/sc-2024.md b/reading-notes/conference/sc-2024.md index 4959b84..c693af1 100644 --- a/reading-notes/conference/sc-2024.md +++ b/reading-notes/conference/sc-2024.md @@ -8,8 +8,104 @@ Paper list: [https://dl.acm.org/doi/proceedings/10.5555/3703596](https://dl.acm. ## Papers +### AI Infrastructure + +* Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00089)] \[[HAI Platform Code](https://github.com/HFAiLab/hai-platform)] + * DeepSeek AI + * Include Network Co-Design, HFReduce (collective communication library), HaiScale (optimized parallelism methods), 3FS Distributed File System, and HAI Platform (task scheduling, fault tolerance). + +### Large Language Models (LLMs) + +* LLM inference + * PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00046)] \[[Code](https://github.com/AutonomicPerfectionist/PipeInfer)] + * Iowa State University & TU Darmstadt + * _Continuous Asynchronous Speculation_: run single-token inference simultaneously with several speculative runs. + * _Early Inference Cancellation_: skip the computation of invalidated runs. + * LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00022)] \[[Benchmark](https://github.com/fmperf-project/fmperf)] \[[Code](https://github.com/IBM/LLM-performance-prediction)] + * IBM Research + * Learn a predictive model to recommend the most cost-effective hardware for a previously unseen LLM. +* LLM fine-tuning + * Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00081)] \[[Code](https://github.com/HPHEX/LongExposure)] + * MSRA & THU +* LLM for anomaly detection + * Large Language Models for Anomaly Detection in Computational Workflows: From Supervised Fine-Tuning to In-Context Learning \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00098)] \[[Code](https://github.com/PoSeiDon-Workflows/LLM_AD)] \[[Benchmark](https://github.com/PoSeiDon-Workflows/FlowBench)] + * Argonne National Laboratory & USC & Oak Ridge National Laboratory + * Investigated two approaches: (1) supervised fine-tuning (pre-trained LLMs are fine-tuned on labeled data for sentence classification to identify anomalies); (2) in-context learning (prompts containing task descriptions and examples guide LLMs in few-shot anomaly detection without fine-tuning). + +### Mixture-of-Experts (MoEs) + +* APTMoE: Affinity-Aware Pipeline Tuning for MoE Models on Bandwidth-Constrained GPU Nodes \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00096)] \[[Code](https://github.com/Atopos-309/APTMoE)] + * SYSU + ### Deep Learning Recommendation Models (DLRMs) -* Accelerating Distributed DLRM Training with Optimized TT Decomposition and Micro-Batching \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00055)] - * Wuhan University & NVIDIA & University of Macau - * **EcoRec** +* Accelerating Distributed DLRM Training with Optimized TT Decomposition and Micro-Batching \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00055)] \[[Code](https://doi.org/10.5281/zenodo.13324403)] + * WHU & NVIDIA & UMacau + * **EcoRec:** eliminate redundancy in TT (Tensor-Train) operations; micro-batching with sorted indices to reduce memory. +* Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00095)] \[[Code](https://zenodo.org/records/13119689)] + * Indiana University, Bloomington & Meta & University of Rochester & ICT, CAS + * In-depth analysis of embedding data features; employ error-bounded lossy compression to reduce the communication data size. +* Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00100)] \[[Code](https://github.com/luckyq/ADSC-24)] + * UC Merced & SK Hynix + * **TECO**: Tensor-CXL-Offload + * Introduce a cache coherence interconnect based on CXL to build a cache coherence domain between CPU memory and accelerator memory; offload tensors to CPU memory to save accelerator memory. +* RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules \[[Paper](https://dl.acm.org/doi/pdf/10.1109/SC41406.2024.00047)] \[[Code](https://github.com/PanZaifeng/RecFlex)] + * RUC & Microsoft & UCSD + * Create fused kernels with distinct schedules for _different_ feature fields. + +### Graph Transformer + +* TorchGT: A Holistic System for Large-Scale Graph Transformer Training \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00083)] \[[Code](https://github.com/zxmeng98/torchgt)] + * NTU & Shanghai AI Lab & ZJU & SenseTime + +### **Reinforcement Learning (RL)** + +* Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00045)] \[[Code](https://github.com/IntelliSys-Lab/Stellaris-SC24)] + * Stevens Institute of Technology & NEU & Stony Brook University & Missouri University of Science and Technology + * Introduce a generic asynchronous learning paradigm. + +### Job Scheduling + +* PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00032)] \[[Code](https://github.com/hal-uw/blox-pal)] + * UW-Madison + * Characterize which applications are more likely to suffer from performance variability; balance performance variability with locality to ensure jobs are spread across as few nodes as possible. + +### Distributed Training + +* ,Optimizing Distributed ML Communication with Fused Computation-Collective Operations \[[Paper](https://dl.acm.org/doi/pdf/10.1109/SC41406.2024.00094)] + * AMD + * Developed three prototype fused operators (embedding + All-to-All, GEMV + AllReduce, and GEMM + All-to-All) to address the communication overheads in DLRM, Transformers and MoE model architectures. + +### Serverless Computing + +* SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00044)] \[[Code](https://github.com/blinkbear/smiless-ad)] + * SIAT, CAS & UMacau + * Integrate adaptive pre-warming windows; built on top of OpenFaaS. + +### GPU Sharing + +* ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00048)] \[[Code](https://github.com/MunQ-Lee/ParvaGPU_SC24)] + * Chung-Ang University & Electronics and Telecommunications Research Institute & Virginia Tech + * Integrate MIG and MPS to enhance GPU utilization. + +### Performance Analysis + +* GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00063)] \[[Code](https://zenodo.org/records/10975567)] + * Beihang University + * Employ _static analysis_ to identify the performance-critical parameters of kernel functions; segment the program execution with external library calls and asynchronous kernel operations; construct a state transfer graph and estimate the workload of each program segment. + +### Interconnects + +* Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00039)] \[[Benchmark](https://zenodo.org/records/13312325)] + * Sapienza University of Rome & University of Trento & Vrije Universiteit Amsterdam & ETH & CINECA & University of Antwerp & HPE & NVIDIA + * Characterize three supercomputers: Alps, Leonardo, and LUMI. + +## Acronyms + +* LLM: Large Language Model +* MoE: Mixture-of-Experts +* DLRM: Deep Learning Recommendation Model +* PEFT: Parameter-Efficient Fine-Tuning +* MIG: Multi-Instance GPU +* MPS: Multi-Process Service +* CXL: Compute Express Link