GITBOOK-175: Update the notes of OSDI '24 papers

mental2008 · Jul 17, 2024 · c555e54 · c555e54
1 parent e8e5cd8
commit c555e54
Showing 1 changed file with 26 additions and 0 deletions.
diff --git a/reading-notes/conference/osdi-2024.md b/reading-notes/conference/osdi-2024.md
@@ -12,8 +12,14 @@ Paper list: [https://www.usenix.org/conference/osdi24/technical-sessions](https:
 
 * Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve \[[Paper](https://www.usenix.org/conference/osdi24/presentation/agrawal)] \[[Code](https://github.com/microsoft/sarathi-serve)]
   * MSR India & GaTech
+  * **Sarathi-Serve**
+    * Chunked-prefills: split a prefill request into _near equal-sized chunks_; create stall-free schedules that add new requests in a batch _without pausing ongoing decodes_.
+    * Stall-free scheduling: improve throughput with large batch sizes; minimize the effect of batching on latency.
 * ServerlessLLM: Low-Latency Serverless Inference for Large Language Models \[[Paper](https://www.usenix.org/conference/osdi24/presentation/fu)] \[[Code](https://github.com/ServerlessLLM/ServerlessLLM)]
   * Edinburgh
+  * Multi-tier checkpoint loading.
+  * Live migration of LLM inference: the source server migrates only the tokens; a re-computation of the KV-cache is triggered at the destination server.
+  * Use cost models to estimate the time of loading checkpoints from different tiers in the storage hierarchy and the time of migrating an LLM inference to another server; choose the best server to minimize model startup latency.
 * InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management \[[Paper](https://www.usenix.org/conference/osdi24/presentation/lee)]
   * Seoul National University
 * Llumnix: Dynamic Scheduling for Large Language Model Serving \[[Paper](https://www.usenix.org/conference/osdi24/presentation/sun-biao)] \[[Code](https://github.com/AlibabaPAI/llumnix)]
@@ -26,6 +32,11 @@ Paper list: [https://www.usenix.org/conference/osdi24/technical-sessions](https:
   * SJTU & MSRA
 * Fairness in Serving Large Language Models \[[Paper](https://www.usenix.org/conference/osdi24/presentation/sheng)] \[[Code](https://github.com/Ying1123/VTC-artifact)]
   * UC Berkeley
+  * This is the _first_ work to discuss the _fair serving_ of LLMs.
+  * Propose a fair-serving algorithm called Virtual Token Counter (VTC).
+    * Track the services received for each client.
+    * Prioritize the ones with the least services received.
+    * Only manipulate the dispatch order and don't reject a request if it can fit in the batch.
 
 ### Resource Allocation
 
@@ -115,3 +126,18 @@ Paper list: [https://www.usenix.org/conference/osdi24/technical-sessions](https:
 * Anvil: Verifying Liveness of Cluster Management Controllers \[[Paper](https://www.usenix.org/conference/osdi24/presentation/sun-xudong)] \[[Code](https://github.com/vmware-research/verifiable-controllers)]
   * UIUC & UW-Madison & VMware Research & Feldera
   * **Best Paper Award**
+
+## References
+
+* Notes from SJTU IPADS (in Chinese)
+  * [OSDI 2024 论文评述 Day 1 Session 1: Memory Management - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/707983034)
+  * [OSDI 2024 论文评述 Day 1 Session 2: Low-Latency LLM Serving - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/707990822)
+  * [OSDI 2024 论文评述 Day 1 Session 3: Distributed Systems - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/707998884)
+  * [OSDI 2024 论文评述 Day 2 Session 4: Deep Learning - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708002201)
+  * [OSDI 2024 论文评述 Day 2 Session 5: Operating Systems - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708003676)
+  * [OSDI 2024 论文评述 Day 2 Session 6: Cloud Computing - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708034284)
+  * [OSDI 2024 论文评述 Day 2 Session 7: Formal Verification - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708035509)
+  * [OSDI 2024 论文评述 Day 3 Session 8: Cloud Security - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708036283)
+  * [OSDI 2024 论文评述 Day 3 Session 9: Data Management - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708037149)
+  * [OSDI 2024 论文评述 Day 3 Session 10: Analysis of Correctness - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708037498)
+  * [OSDI 2024 论文评述 Day 3 Session 11: ML Scheduling - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708038262)