Skip to content

Commit

Permalink
GITBOOK-201: Organize the papers of SC '24
Browse files Browse the repository at this point in the history
  • Loading branch information
mental2008 authored and gitbook-bot committed Dec 17, 2024
1 parent 0f1282d commit 53e74fd
Show file tree
Hide file tree
Showing 4 changed files with 104 additions and 9 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Specifically, I have a broad interest in systems (e.g., OSDI, SOSP, NSDI, ATC, E

## Changelogs

* 12/2024: Briefly organize the papers of [EuroSys 2025](reading-notes/conference/eurosys-2025.md) (only Spring cycle); organize the papers of [SoCC 2024](reading-notes/conference/socc-2024.md).
* 12/2024: Briefly organize the papers of [EuroSys 2025](reading-notes/conference/eurosys-2025.md) (only Spring cycle); organize the papers of [SoCC 2024](reading-notes/conference/socc-2024.md), [SC 2024](reading-notes/conference/sc-2024.md).
* 09/2024: Organize the papers of [SOSP 2024](reading-notes/conference/sosp-2024.md).
* 08/2024: Organize the papers of [VLDB 2024](reading-notes/conference/vldb-2024.md); update the reading notes of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md); create new paper lists of [diffusion models](paper-list/artificial-intelligence/diffusion-models.md), [language models](paper-list/artificial-intelligence/language-models.md), and [deep learning recommendation models](paper-list/artificial-intelligence/dlrm.md).
* 07/2024: Organize the papers of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md), [ICML 2024](reading-notes/conference/icml-2024.md), [ATC 2024](reading-notes/conference/atc-2024.md), [OSDI 2024](reading-notes/conference/osdi-2024.md), [NSDI 2024](reading-notes/conference/nsdi-2024.md), [CVPR 2024](reading-notes/conference/cvpr-2024.md), [ISCA 2024](reading-notes/conference/isca-2024.md); create a new paper list of [systems for diffusion models](paper-list/systems-for-ml/diffusion-models.md); update the paper list of [systems for LLMs](paper-list/systems-for-ml/llm.md), [systems for DLRMs](paper-list/systems-for-ml/dlrm.md), and [resource scheduler](paper-list/systems-for-ml/resource-scheduler.md).
Expand Down
2 changes: 1 addition & 1 deletion reading-notes/conference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
| Conference | When | Where | Remarks |
| :-----------------------------: | :----------------: | ------------------------------------------------------ | :-------------------------------------------: |
| [SoCC 2024](socc-2024.md) | Nov 22-24, 2024 | Seattle, Washington, USA | 🧐 |
| [SC 2024](sc-2024.md) | Nov 17-22, 2024 | Atlanta, GA, USA | WIP |
| [SC 2024](sc-2024.md) | Nov 17-22, 2024 | Atlanta, GA, USA | 🧐 |
| [SOSP 2024](sosp-2024.md) | Nov 4-6, 2024 | Hilton Austin, Texas, USA | 🧐 |
| [VLDB 2024](vldb-2024.md) | Aug 26-30, 2024 | Guangzhou, China | 🧐 |
| [SIGCOMM 2024](sigcomm-2024.md) | Aug 4-8, 2024 | Sydney, Australia | 🧐 |
Expand Down
7 changes: 3 additions & 4 deletions reading-notes/conference/eurosys-2025.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Paper list: [https://2025.eurosys.org/accepted-papers.html](https://2025.eurosys
* CUHK-Shenzhen & UChicago & Stanford
* T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
* USTC & MSRA
* RLHF
* LLM fine-tuning
* HybridFlow: A Flexible and Efficient RLHF Framework
* HKU & ByteDance

Expand Down Expand Up @@ -54,6 +54,5 @@ Paper list: [https://2025.eurosys.org/accepted-papers.html](https://2025.eurosys

## Acronyms

RLHF: Reinforcement Learning from Human Feedback

ML: Machine Learning
* RLHF: Reinforcement Learning from Human Feedback
* ML: Machine Learning
102 changes: 99 additions & 3 deletions reading-notes/conference/sc-2024.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,104 @@ Paper list: [https://dl.acm.org/doi/proceedings/10.5555/3703596](https://dl.acm.

## Papers

### AI Infrastructure

* Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00089)] \[[HAI Platform Code](https://github.com/HFAiLab/hai-platform)]
* DeepSeek AI
* Include Network Co-Design, HFReduce (collective communication library), HaiScale (optimized parallelism methods), 3FS Distributed File System, and HAI Platform (task scheduling, fault tolerance).

### Large Language Models (LLMs)

* LLM inference
* PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00046)] \[[Code](https://github.com/AutonomicPerfectionist/PipeInfer)]
* Iowa State University & TU Darmstadt
* _Continuous Asynchronous Speculation_: run single-token inference simultaneously with several speculative runs.
* _Early Inference Cancellation_: skip the computation of invalidated runs.
* LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00022)] \[[Benchmark](https://github.com/fmperf-project/fmperf)] \[[Code](https://github.com/IBM/LLM-performance-prediction)]
* IBM Research
* Learn a predictive model to recommend the most cost-effective hardware for a previously unseen LLM.
* LLM fine-tuning
* Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00081)] \[[Code](https://github.com/HPHEX/LongExposure)]
* MSRA & THU
* LLM for anomaly detection
* Large Language Models for Anomaly Detection in Computational Workflows: From Supervised Fine-Tuning to In-Context Learning \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00098)] \[[Code](https://github.com/PoSeiDon-Workflows/LLM_AD)] \[[Benchmark](https://github.com/PoSeiDon-Workflows/FlowBench)]
* Argonne National Laboratory & USC & Oak Ridge National Laboratory
* Investigated two approaches: (1) supervised fine-tuning (pre-trained LLMs are fine-tuned on labeled data for sentence classification to identify anomalies); (2) in-context learning (prompts containing task descriptions and examples guide LLMs in few-shot anomaly detection without fine-tuning).

### Mixture-of-Experts (MoEs)

* APTMoE: Affinity-Aware Pipeline Tuning for MoE Models on Bandwidth-Constrained GPU Nodes \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00096)] \[[Code](https://github.com/Atopos-309/APTMoE)]
* SYSU

### Deep Learning Recommendation Models (DLRMs)

* Accelerating Distributed DLRM Training with Optimized TT Decomposition and Micro-Batching \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00055)]
* Wuhan University & NVIDIA & University of Macau
* **EcoRec**
* Accelerating Distributed DLRM Training with Optimized TT Decomposition and Micro-Batching \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00055)] \[[Code](https://doi.org/10.5281/zenodo.13324403)]
* WHU & NVIDIA & UMacau
* **EcoRec:** eliminate redundancy in TT (Tensor-Train) operations; micro-batching with sorted indices to reduce memory.
* Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00095)] \[[Code](https://zenodo.org/records/13119689)]
* Indiana University, Bloomington & Meta & University of Rochester & ICT, CAS
* In-depth analysis of embedding data features; employ error-bounded lossy compression to reduce the communication data size.
* Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00100)] \[[Code](https://github.com/luckyq/ADSC-24)]
* UC Merced & SK Hynix
* **TECO**: Tensor-CXL-Offload
* Introduce a cache coherence interconnect based on CXL to build a cache coherence domain between CPU memory and accelerator memory; offload tensors to CPU memory to save accelerator memory.
* RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules \[[Paper](https://dl.acm.org/doi/pdf/10.1109/SC41406.2024.00047)] \[[Code](https://github.com/PanZaifeng/RecFlex)]
* RUC & Microsoft & UCSD
* Create fused kernels with distinct schedules for _different_ feature fields.

### Graph Transformer

* TorchGT: A Holistic System for Large-Scale Graph Transformer Training \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00083)] \[[Code](https://github.com/zxmeng98/torchgt)]
* NTU & Shanghai AI Lab & ZJU & SenseTime

### **Reinforcement Learning (RL)**

* Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00045)] \[[Code](https://github.com/IntelliSys-Lab/Stellaris-SC24)]
* Stevens Institute of Technology & NEU & Stony Brook University & Missouri University of Science and Technology
* Introduce a generic asynchronous learning paradigm.

### Job Scheduling

* PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00032)] \[[Code](https://github.com/hal-uw/blox-pal)]
* UW-Madison
* Characterize which applications are more likely to suffer from performance variability; balance performance variability with locality to ensure jobs are spread across as few nodes as possible.

### Distributed Training

* ,Optimizing Distributed ML Communication with Fused Computation-Collective Operations \[[Paper](https://dl.acm.org/doi/pdf/10.1109/SC41406.2024.00094)]
* AMD
* Developed three prototype fused operators (embedding + All-to-All, GEMV + AllReduce, and GEMM + All-to-All) to address the communication overheads in DLRM, Transformers and MoE model architectures.

### Serverless Computing

* SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00044)] \[[Code](https://github.com/blinkbear/smiless-ad)]
* SIAT, CAS & UMacau
* Integrate adaptive pre-warming windows; built on top of OpenFaaS.

### GPU Sharing

* ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00048)] \[[Code](https://github.com/MunQ-Lee/ParvaGPU_SC24)]
* Chung-Ang University & Electronics and Telecommunications Research Institute & Virginia Tech
* Integrate MIG and MPS to enhance GPU utilization.

### Performance Analysis

* GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00063)] \[[Code](https://zenodo.org/records/10975567)]
* Beihang University
* Employ _static analysis_ to identify the performance-critical parameters of kernel functions; segment the program execution with external library calls and asynchronous kernel operations; construct a state transfer graph and estimate the workload of each program segment.

### Interconnects

* Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects \[[Paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00039)] \[[Benchmark](https://zenodo.org/records/13312325)]
* Sapienza University of Rome & University of Trento & Vrije Universiteit Amsterdam & ETH & CINECA & University of Antwerp & HPE & NVIDIA
* Characterize three supercomputers: Alps, Leonardo, and LUMI.

## Acronyms

* LLM: Large Language Model
* MoE: Mixture-of-Experts
* DLRM: Deep Learning Recommendation Model
* PEFT: Parameter-Efficient Fine-Tuning
* MIG: Multi-Instance GPU
* MPS: Multi-Process Service
* CXL: Compute Express Link

0 comments on commit 53e74fd

Please sign in to comment.