From d32256dbc2c53b337555a042b3c95244ca419ca8 Mon Sep 17 00:00:00 2001 From: Lingyun Yang Date: Sat, 21 Sep 2024 09:14:41 +0000 Subject: [PATCH] GITBOOK-193: Organize SOSP 24 papers --- README.md | 1 + SUMMARY.md | 1 + reading-notes/conference/README.md | 4 +- reading-notes/conference/sosp-2024.md | 59 +++++++++++++++++++++++++++ 4 files changed, 63 insertions(+), 2 deletions(-) create mode 100644 reading-notes/conference/sosp-2024.md diff --git a/README.md b/README.md index f46f672..8862763 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,7 @@ Specifically, I have a broad interest in systems (e.g., OSDI, SOSP, NSDI, ATC, E ## Changelogs +* 09/2024: Organize the papers of [SOSP 2024](reading-notes/conference/sosp-2024.md). * 08/2024: Organize the papers of [VLDB 2024](reading-notes/conference/vldb-2024.md); update the reading notes of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md); create new paper lists of [diffusion models](paper-list/artificial-intelligence/diffusion-models.md), [language models](paper-list/artificial-intelligence/language-models.md), and [deep learning recommendation models](paper-list/artificial-intelligence/dlrm.md). * 07/2024: Organize the papers of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md), [ICML 2024](reading-notes/conference/icml-2024.md), [ATC 2024](reading-notes/conference/atc-2024.md), [OSDI 2024](reading-notes/conference/osdi-2024.md), [NSDI 2024](reading-notes/conference/nsdi-2024.md), [CVPR 2024](reading-notes/conference/cvpr-2024.md), [ISCA 2024](reading-notes/conference/isca-2024.md); create a new paper list of [systems for diffusion models](paper-list/systems-for-ml/diffusion-models.md); update the paper list of [systems for LLMs](paper-list/systems-for-ml/llm.md), [systems for DLRMs](paper-list/systems-for-ml/dlrm.md), and [resource scheduler](paper-list/systems-for-ml/resource-scheduler.md). diff --git a/SUMMARY.md b/SUMMARY.md index 6ec97c8..d002094 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -40,6 +40,7 @@ ## Reading Notes * [Conference](reading-notes/conference/README.md) + * [SOSP 2024](reading-notes/conference/sosp-2024.md) * [VLDB 2024](reading-notes/conference/vldb-2024.md) * [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md) * [ICML 2024](reading-notes/conference/icml-2024.md) diff --git a/reading-notes/conference/README.md b/reading-notes/conference/README.md index e64564f..400407d 100644 --- a/reading-notes/conference/README.md +++ b/reading-notes/conference/README.md @@ -6,8 +6,8 @@ | :-----------------------------: | :----------------: | ------------------------------------------------------ | :-------------------------------------------: | | SoCC 2024 | Nov 22-24, 2024 | Seattle, Washington, USA | **Upcoming** | | SC 2024 | Nov 17-22, 2024 | Atlanta, GA, USA | **Upcoming** | -| SOSP 2024 | Nov 4-6, 2024 | Hilton Austin, Texas, USA | **Upcoming** | -| [VLDB 2024](vldb-2024.md) | Aug 26-30, 2024 | Guangzhou, China | **Upcoming** | +| [SOSP 2024](sosp-2024.md) | Nov 4-6, 2024 | Hilton Austin, Texas, USA | **Upcoming** | +| [VLDB 2024](vldb-2024.md) | Aug 26-30, 2024 | Guangzhou, China | 🧐 | | [SIGCOMM 2024](sigcomm-2024.md) | Aug 4-8, 2024 | Sydney, Australia | 🧐 | | [ICML 2024](icml-2024.md) | Jul 21-27, 2024 | Messe Wien Exhibition Congress Center, Vienna, Austria | | | [ATC 2024](atc-2024.md) | Jul 10-12, 2024 | Santa Clara, CA, USA | 🧐; co-located with [OSDI 2024](osdi-2024.md) | diff --git a/reading-notes/conference/sosp-2024.md b/reading-notes/conference/sosp-2024.md new file mode 100644 index 0000000..f480053 --- /dev/null +++ b/reading-notes/conference/sosp-2024.md @@ -0,0 +1,59 @@ +# SOSP 2024 + +## Meta Info + +Homepage: [https://sigops.org/s/conferences/sosp/2024/](https://sigops.org/s/conferences/sosp/2024/) + +## Papers + +### Large Language Models (LLMs) + +* LLM Training + * Enabling Parallelism Hot Switching for Efficient Training of Large Language Models + * PKU + * Perseus: Removing Energy Bloat from Large Model Training \[[arXiv](https://arxiv.org/abs/2312.06902)] + * UMich + * Use a graph cut-based algorithm to obtain the "iteration time-energy" Pareto frontier; schedule the energy consumption across time. +* LLM Inference + * LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism \[[arXiv](https://arxiv.org/abs/2404.09526)] + * PKU + * ESP: Elastic Sequence Parallelism + * Elastically adjust the degree of parallelism in real-time; reduce key-value cache migration overhead and overlap partial decoding communication with computation; reduce key-value cache fragmentation across instances. + * PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU \[[arXiv](https://arxiv.org/abs/2312.12456)] + * SJTU IPADS + +### ML Serving + +* Improving DNN Inference Throughput using Practical, Per-Input Compute Adaptation + * GaTech & Princeton +* Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving \[[arXiv](https://arxiv.org/abs/2312.05385)] + * Princeton & GaTech + * Automatically apply and manage early exits (certain inputs can exit with results at intermediate layers) in ML models. + +### Distributed Training + +* SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures \[[arXiv](https://arxiv.org/abs/2405.14009)] + * Stanford + * Dynamically re-route the work of a failed server to data-parallel peers; execute within bubbles of the original pipeline schedule. +* Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections \[[arXiv](https://arxiv.org/abs/2312.05181)] + * ICL + * **Tenplex** — a state management library. + * Enable jobs to change the parallelism dynamically. + * PTC: Parallelizable Tensor Collection + * Dataset state + * Modle state + * Execute PTC transformations in parallel with minimum data movement between workers. + +### ML Compilation + +* Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor \[[arXiv](https://arxiv.org/abs/2408.04808)] + * UIUC & MSRA + * **T10**, the first DL compiler to exploit the inter-core communication bandwidth and distributed on-chip memory on AI chips (i.e., Graphcore IPU). +* SilvanForge: A Schedule-Guided Retargetable Compiler for Decision Tree Inference + * IISc + +### Serverless Computing + +* Dirigent: Lightweight Serverless Orchestration \[[arXiv](https://arxiv.org/abs/2404.16393)] + * ETH + * Simplify state management of the existing orchestration system (Kubernetes); eliminate persistent state updates; run monolithic control and data planes to minimize internal communication overheads.