From d32256dbc2c53b337555a042b3c95244ca419ca8 Mon Sep 17 00:00:00 2001
From: Lingyun Yang <yangly1999@gmail.com>
Date: Sat, 21 Sep 2024 09:14:41 +0000
Subject: [PATCH] GITBOOK-193: Organize SOSP 24 papers

---
 README.md                             |  1 +
 SUMMARY.md                            |  1 +
 reading-notes/conference/README.md    |  4 +-
 reading-notes/conference/sosp-2024.md | 59 +++++++++++++++++++++++++++
 4 files changed, 63 insertions(+), 2 deletions(-)
 create mode 100644 reading-notes/conference/sosp-2024.md

diff --git a/README.md b/README.md
index f46f672..8862763 100644
--- a/README.md
+++ b/README.md
@@ -18,6 +18,7 @@ Specifically, I have a broad interest in systems (e.g., OSDI, SOSP, NSDI, ATC, E
 
 ## Changelogs
 
+* 09/2024: Organize the papers of [SOSP 2024](reading-notes/conference/sosp-2024.md).
 * 08/2024: Organize the papers of [VLDB 2024](reading-notes/conference/vldb-2024.md); update the reading notes of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md); create new paper lists of [diffusion models](paper-list/artificial-intelligence/diffusion-models.md), [language models](paper-list/artificial-intelligence/language-models.md), and [deep learning recommendation models](paper-list/artificial-intelligence/dlrm.md).
 * 07/2024: Organize the papers of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md), [ICML 2024](reading-notes/conference/icml-2024.md), [ATC 2024](reading-notes/conference/atc-2024.md), [OSDI 2024](reading-notes/conference/osdi-2024.md), [NSDI 2024](reading-notes/conference/nsdi-2024.md), [CVPR 2024](reading-notes/conference/cvpr-2024.md), [ISCA 2024](reading-notes/conference/isca-2024.md); create a new paper list of [systems for diffusion models](paper-list/systems-for-ml/diffusion-models.md); update the paper list of [systems for LLMs](paper-list/systems-for-ml/llm.md), [systems for DLRMs](paper-list/systems-for-ml/dlrm.md), and [resource scheduler](paper-list/systems-for-ml/resource-scheduler.md).
 
diff --git a/SUMMARY.md b/SUMMARY.md
index 6ec97c8..d002094 100644
--- a/SUMMARY.md
+++ b/SUMMARY.md
@@ -40,6 +40,7 @@
 ## Reading Notes
 
 * [Conference](reading-notes/conference/README.md)
+  * [SOSP 2024](reading-notes/conference/sosp-2024.md)
   * [VLDB 2024](reading-notes/conference/vldb-2024.md)
   * [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md)
   * [ICML 2024](reading-notes/conference/icml-2024.md)
diff --git a/reading-notes/conference/README.md b/reading-notes/conference/README.md
index e64564f..400407d 100644
--- a/reading-notes/conference/README.md
+++ b/reading-notes/conference/README.md
@@ -6,8 +6,8 @@
 | :-----------------------------: | :----------------: | ------------------------------------------------------ | :-------------------------------------------: |
 |            SoCC 2024            |   Nov 22-24, 2024  | Seattle, Washington, USA                               |                  **Upcoming**                 |
 |             SC 2024             |   Nov 17-22, 2024  | Atlanta, GA, USA                                       |                  **Upcoming**                 |
-|            SOSP 2024            |    Nov 4-6, 2024   | Hilton Austin, Texas, USA                              |                  **Upcoming**                 |
-|    [VLDB 2024](vldb-2024.md)    |   Aug 26-30, 2024  | Guangzhou, China                                       |                  **Upcoming**                 |
+|    [SOSP 2024](sosp-2024.md)    |    Nov 4-6, 2024   | Hilton Austin, Texas, USA                              |                  **Upcoming**                 |
+|    [VLDB 2024](vldb-2024.md)    |   Aug 26-30, 2024  | Guangzhou, China                                       |                       🧐                      |
 | [SIGCOMM 2024](sigcomm-2024.md) |    Aug 4-8, 2024   | Sydney, Australia                                      |                       🧐                      |
 |    [ICML 2024](icml-2024.md)    |   Jul 21-27, 2024  | Messe Wien Exhibition Congress Center, Vienna, Austria |                                               |
 |     [ATC 2024](atc-2024.md)     |   Jul 10-12, 2024  | Santa Clara, CA, USA                                   | 🧐; co-located with [OSDI 2024](osdi-2024.md) |
diff --git a/reading-notes/conference/sosp-2024.md b/reading-notes/conference/sosp-2024.md
new file mode 100644
index 0000000..f480053
--- /dev/null
+++ b/reading-notes/conference/sosp-2024.md
@@ -0,0 +1,59 @@
+# SOSP 2024
+
+## Meta Info
+
+Homepage: [https://sigops.org/s/conferences/sosp/2024/](https://sigops.org/s/conferences/sosp/2024/)
+
+## Papers
+
+### Large Language Models (LLMs)
+
+* LLM Training
+  * Enabling Parallelism Hot Switching for Efficient Training of Large Language Models
+    * PKU
+  * Perseus: Removing Energy Bloat from Large Model Training \[[arXiv](https://arxiv.org/abs/2312.06902)]
+    * UMich
+    * Use a graph cut-based algorithm to obtain the "iteration time-energy" Pareto frontier; schedule the energy consumption across time.
+* LLM Inference
+  * LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism \[[arXiv](https://arxiv.org/abs/2404.09526)]
+    * PKU
+      * ESP: Elastic Sequence Parallelism
+      * Elastically adjust the degree of parallelism in real-time; reduce key-value cache migration overhead and overlap partial decoding communication with computation; reduce key-value cache fragmentation across instances.
+  * PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU \[[arXiv](https://arxiv.org/abs/2312.12456)]
+    * SJTU IPADS
+
+### ML Serving
+
+* Improving DNN Inference Throughput using Practical, Per-Input Compute Adaptation
+  * GaTech & Princeton
+* Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving \[[arXiv](https://arxiv.org/abs/2312.05385)]
+  * Princeton & GaTech
+  * Automatically apply and manage early exits (certain inputs can exit with results at intermediate layers) in ML models.
+
+### Distributed Training
+
+* SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures \[[arXiv](https://arxiv.org/abs/2405.14009)]
+  * Stanford
+  * Dynamically re-route the work of a failed server to data-parallel peers; execute within bubbles of the original pipeline schedule.
+* Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections \[[arXiv](https://arxiv.org/abs/2312.05181)]
+  * ICL
+  * **Tenplex** — a state management library.
+    * Enable jobs to change the parallelism dynamically.
+    * PTC: Parallelizable Tensor Collection
+      * Dataset state
+      * Modle state&#x20;
+    * Execute PTC transformations in parallel with minimum data movement between workers.
+
+### ML Compilation
+
+* Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor \[[arXiv](https://arxiv.org/abs/2408.04808)]
+  * UIUC & MSRA
+  * **T10**, the first DL compiler to exploit the inter-core communication bandwidth and distributed on-chip memory on AI chips (i.e., Graphcore IPU).
+* SilvanForge: A Schedule-Guided Retargetable Compiler for Decision Tree Inference
+  * IISc
+
+### Serverless Computing
+
+* Dirigent: Lightweight Serverless Orchestration \[[arXiv](https://arxiv.org/abs/2404.16393)]
+  * ETH
+  * Simplify state management of the existing orchestration system (Kubernetes); eliminate persistent state updates; run monolithic control and data planes to minimize internal communication overheads.