Skip to content

Commit

Permalink
GITBOOK-175: Update the notes of OSDI '24 papers
Browse files Browse the repository at this point in the history
  • Loading branch information
mental2008 authored and gitbook-bot committed Jul 17, 2024
1 parent e8e5cd8 commit c555e54
Showing 1 changed file with 26 additions and 0 deletions.
26 changes: 26 additions & 0 deletions reading-notes/conference/osdi-2024.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,14 @@ Paper list: [https://www.usenix.org/conference/osdi24/technical-sessions](https:

* Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve \[[Paper](https://www.usenix.org/conference/osdi24/presentation/agrawal)] \[[Code](https://github.com/microsoft/sarathi-serve)]
* MSR India & GaTech
* **Sarathi-Serve**
* Chunked-prefills: split a prefill request into _near equal-sized chunks_; create stall-free schedules that add new requests in a batch _without pausing ongoing decodes_.
* Stall-free scheduling: improve throughput with large batch sizes; minimize the effect of batching on latency.
* ServerlessLLM: Low-Latency Serverless Inference for Large Language Models \[[Paper](https://www.usenix.org/conference/osdi24/presentation/fu)] \[[Code](https://github.com/ServerlessLLM/ServerlessLLM)]
* Edinburgh
* Multi-tier checkpoint loading.
* Live migration of LLM inference: the source server migrates only the tokens; a re-computation of the KV-cache is triggered at the destination server.
* Use cost models to estimate the time of loading checkpoints from different tiers in the storage hierarchy and the time of migrating an LLM inference to another server; choose the best server to minimize model startup latency.
* InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management \[[Paper](https://www.usenix.org/conference/osdi24/presentation/lee)]
* Seoul National University
* Llumnix: Dynamic Scheduling for Large Language Model Serving \[[Paper](https://www.usenix.org/conference/osdi24/presentation/sun-biao)] \[[Code](https://github.com/AlibabaPAI/llumnix)]
Expand All @@ -26,6 +32,11 @@ Paper list: [https://www.usenix.org/conference/osdi24/technical-sessions](https:
* SJTU & MSRA
* Fairness in Serving Large Language Models \[[Paper](https://www.usenix.org/conference/osdi24/presentation/sheng)] \[[Code](https://github.com/Ying1123/VTC-artifact)]
* UC Berkeley
* This is the _first_ work to discuss the _fair serving_ of LLMs.
* Propose a fair-serving algorithm called Virtual Token Counter (VTC).
* Track the services received for each client.
* Prioritize the ones with the least services received.
* Only manipulate the dispatch order and don't reject a request if it can fit in the batch.

### Resource Allocation

Expand Down Expand Up @@ -115,3 +126,18 @@ Paper list: [https://www.usenix.org/conference/osdi24/technical-sessions](https:
* Anvil: Verifying Liveness of Cluster Management Controllers \[[Paper](https://www.usenix.org/conference/osdi24/presentation/sun-xudong)] \[[Code](https://github.com/vmware-research/verifiable-controllers)]
* UIUC & UW-Madison & VMware Research & Feldera
* **Best Paper Award**

## References

* Notes from SJTU IPADS (in Chinese)
* [OSDI 2024 论文评述 Day 1 Session 1: Memory Management - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/707983034)
* [OSDI 2024 论文评述 Day 1 Session 2: Low-Latency LLM Serving - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/707990822)
* [OSDI 2024 论文评述 Day 1 Session 3: Distributed Systems - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/707998884)
* [OSDI 2024 论文评述 Day 2 Session 4: Deep Learning - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708002201)
* [OSDI 2024 论文评述 Day 2 Session 5: Operating Systems - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708003676)
* [OSDI 2024 论文评述 Day 2 Session 6: Cloud Computing - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708034284)
* [OSDI 2024 论文评述 Day 2 Session 7: Formal Verification - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708035509)
* [OSDI 2024 论文评述 Day 3 Session 8: Cloud Security - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708036283)
* [OSDI 2024 论文评述 Day 3 Session 9: Data Management - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708037149)
* [OSDI 2024 论文评述 Day 3 Session 10: Analysis of Correctness - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708037498)
* [OSDI 2024 论文评述 Day 3 Session 11: ML Scheduling - IPADS-SYS 的文章 - 知乎](https://zhuanlan.zhihu.com/p/708038262)

0 comments on commit c555e54

Please sign in to comment.