-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
GITBOOK-193: Organize SOSP 24 papers
- Loading branch information
1 parent
5f73757
commit d32256d
Showing
4 changed files
with
63 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# SOSP 2024 | ||
|
||
## Meta Info | ||
|
||
Homepage: [https://sigops.org/s/conferences/sosp/2024/](https://sigops.org/s/conferences/sosp/2024/) | ||
|
||
## Papers | ||
|
||
### Large Language Models (LLMs) | ||
|
||
* LLM Training | ||
* Enabling Parallelism Hot Switching for Efficient Training of Large Language Models | ||
* PKU | ||
* Perseus: Removing Energy Bloat from Large Model Training \[[arXiv](https://arxiv.org/abs/2312.06902)] | ||
* UMich | ||
* Use a graph cut-based algorithm to obtain the "iteration time-energy" Pareto frontier; schedule the energy consumption across time. | ||
* LLM Inference | ||
* LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism \[[arXiv](https://arxiv.org/abs/2404.09526)] | ||
* PKU | ||
* ESP: Elastic Sequence Parallelism | ||
* Elastically adjust the degree of parallelism in real-time; reduce key-value cache migration overhead and overlap partial decoding communication with computation; reduce key-value cache fragmentation across instances. | ||
* PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU \[[arXiv](https://arxiv.org/abs/2312.12456)] | ||
* SJTU IPADS | ||
|
||
### ML Serving | ||
|
||
* Improving DNN Inference Throughput using Practical, Per-Input Compute Adaptation | ||
* GaTech & Princeton | ||
* Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving \[[arXiv](https://arxiv.org/abs/2312.05385)] | ||
* Princeton & GaTech | ||
* Automatically apply and manage early exits (certain inputs can exit with results at intermediate layers) in ML models. | ||
|
||
### Distributed Training | ||
|
||
* SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures \[[arXiv](https://arxiv.org/abs/2405.14009)] | ||
* Stanford | ||
* Dynamically re-route the work of a failed server to data-parallel peers; execute within bubbles of the original pipeline schedule. | ||
* Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections \[[arXiv](https://arxiv.org/abs/2312.05181)] | ||
* ICL | ||
* **Tenplex** — a state management library. | ||
* Enable jobs to change the parallelism dynamically. | ||
* PTC: Parallelizable Tensor Collection | ||
* Dataset state | ||
* Modle state  | ||
* Execute PTC transformations in parallel with minimum data movement between workers. | ||
|
||
### ML Compilation | ||
|
||
* Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor \[[arXiv](https://arxiv.org/abs/2408.04808)] | ||
* UIUC & MSRA | ||
* **T10**, the first DL compiler to exploit the inter-core communication bandwidth and distributed on-chip memory on AI chips (i.e., Graphcore IPU). | ||
* SilvanForge: A Schedule-Guided Retargetable Compiler for Decision Tree Inference | ||
* IISc | ||
|
||
### Serverless Computing | ||
|
||
* Dirigent: Lightweight Serverless Orchestration \[[arXiv](https://arxiv.org/abs/2404.16393)] | ||
* ETH | ||
* Simplify state management of the existing orchestration system (Kubernetes); eliminate persistent state updates; run monolithic control and data planes to minimize internal communication overheads. |