GITBOOK-184: Update the paper lists: Systems for LLMs, Model Serving,…

… ICML '24
mental2008 · Jul 22, 2024 · ca7d867 · ca7d867
1 parent b4f2f55
commit ca7d867
Show file tree

Hide file tree

Showing 3 changed files with 35 additions and 12 deletions.
diff --git a/paper-list/systems-for-ml/llm.md b/paper-list/systems-for-ml/llm.md
@@ -6,16 +6,23 @@ I am actively maintaining this list.
 
 ## LLM Training
 
+### Hybrid parallelism
+
+* Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism ([ATC 2024](../../reading-notes/conference/atc-2024.md)) \[[Paper](https://www.usenix.org/conference/atc24/presentation/yuan)] \[[Code](https://github.com/kwai/Megatron-Kwai/tree/atc24ae/examples/atc24)]
+  * Kuaishou
+* Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning ([OSDI 2022](../../reading-notes/conference/osdi-2022/)) \[[Paper](https://www.usenix.org/conference/osdi22/presentation/zheng-lianmin)] \[[Code](https://github.com/alpa-projects/alpa)] \[[Docs](https://alpa.ai/)]
+  * UC Berkeley & AWS & Google & SJTU & CMU & Duke
+  * Generalize the search through _parallelism strategies_.
+
+### Fault tolerance
+
 * Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates ([SOSP 2023](../../reading-notes/conference/sosp-2023/)) \[[Paper](https://dl.acm.org/doi/abs/10.1145/3600006.3613152)] \[[arXiv](https://browse.arxiv.org/abs/2309.08125)] \[[Code](https://github.com/SymbioticLab/Oobleck)]
   * UMich SymbioticLab & AWS & PKU
 * Gemini: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints ([SOSP 2023](../../reading-notes/conference/sosp-2023/)) \[[Paper](https://dl.acm.org/doi/10.1145/3600006.3613145)]
   * Rice & AWS
 * Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs ([NSDI 2023](../../reading-notes/conference/nsdi-2023/)) \[[Paper](https://www.usenix.org/conference/nsdi23/presentation/thorpe)] \[[Code](https://github.com/uclasystem/bamboo)]
   * UCLA & CMU & MSR & Princeton
   * Resilient distributed training
-* Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning ([OSDI 2022](../../reading-notes/conference/osdi-2022/)) \[[Paper](https://www.usenix.org/conference/osdi22/presentation/zheng-lianmin)] \[[Code](https://github.com/alpa-projects/alpa)] \[[Docs](https://alpa.ai/)]
-  * UC Berkeley & AWS & Google & SJTU & CMU & Duke
-  * Generalize the search through _parallelism strategies_.
 
 ## LLM Inference
 
@@ -131,6 +138,11 @@ I am actively maintaining this list.
 * Fairness in Serving Large Language Models ([OSDI 2024](../../reading-notes/conference/osdi-2024.md)) \[[Paper](https://www.usenix.org/conference/osdi24/presentation/sheng)] \[[Code](https://github.com/Ying1123/VTC-artifact)]
   * UC Berkeley
 
+## LLM Alignment
+
+* PUZZLE: Efficiently Aligning Large Language Models through Light-Weight Context Switch ([ATC 2024](../../reading-notes/conference/atc-2024.md)) \[[Paper](https://www.usenix.org/conference/atc24/presentation/lei)]
+  * THU
+
 ## LLMs
 
 * Llama 2: Open Foundation and Fine-Tuned Chat Models (arXiv 2307.09288) \[[Paper](https://arxiv.org/abs/2307.09288)] \[[Homepage](https://ai.meta.com/llama/)]

diff --git a/paper-list/systems-for-ml/model-serving.md b/paper-list/systems-for-ml/model-serving.md
@@ -10,6 +10,8 @@ I am actively maintaining this list.
 
 ## Model Serving Systems
 
+* Usher: Holistic Interference Avoidance for Resource Optimized ML Inference ([OSDI 2024](../../reading-notes/conference/osdi-2024.md)) \[[Paper](https://www.usenix.org/conference/osdi24/presentation/shubha)] \[[Code](https://github.com/ss7krd/Usher)]
+  * UVA & GaTech
 * Paella: Low-latency Model Serving with Software-defined GPU Scheduling ([SOSP 2023](../../reading-notes/conference/sosp-2023/)) \[[Paper](https://dl.acm.org/doi/10.1145/3600006.3613163)]
   * UPenn & DBOS, Inc.
 * Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences ([OSDI 2022](../../reading-notes/conference/osdi-2022/)) \[[Personal Notes](../../reading-notes/conference/osdi-2022/reef.md)] \[[Paper](https://www.usenix.org/conference/osdi22/presentation/han)] \[[Code](https://github.com/SJTU-IPADS/reef)] \[[Benchmark](https://github.com/SJTU-IPADS/disb)] \[[Artifact](https://github.com/SJTU-IPADS/reef-artifacts/tree/osdi22-ae)]

diff --git a/reading-notes/conference/icml-2024.md b/reading-notes/conference/icml-2024.md
@@ -6,15 +6,24 @@ Homepage: [https://icml.cc/Conferences/2024](https://icml.cc/Conferences/2024)
 
 ### Papers
 
-### Serving Large Language Models (LLMs)
-
-* HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment \[[Personal Notes](../miscellaneous/arxiv/2023/hexgen.md)] \[[arXiv](https://arxiv.org/abs/2311.11514)] \[[Code](https://github.com/Relaxed-System-Lab/HexGen)]
-  * HKUST & ETH & CMU
-    * Support _asymmetric_ tensor model parallelism and pipeline parallelism under the _heterogeneous_ setting (i.e., each pipeline parallel stage can be assigned with a different number of layers and tensor model parallel degree).
-    * Propose _a heuristic-based evolutionary algorithm_ to search for the optimal layout.
-
-### Multimodality
-
+### Large Language Models (LLMs)
+
+* Serving LLMs
+  * HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment \[[Personal Notes](../miscellaneous/arxiv/2023/hexgen.md)] \[[arXiv](https://arxiv.org/abs/2311.11514)] \[[Code](https://github.com/Relaxed-System-Lab/HexGen)]
+    * HKUST & ETH & CMU
+      * Support _asymmetric_ tensor model parallelism and pipeline parallelism under the _heterogeneous_ setting (i.e., each pipeline parallel stage can be assigned with a different number of layers and tensor model parallel degree).
+        * Propose _a heuristic-based evolutionary algorithm_ to search for the optimal layout.
+  * MuxServe: Flexible Spatial-Temporal Multiplexing for LLM Serving \[[arXiv](https://arxiv.org/abs/2404.02015)] \[[Code](https://github.com/hao-ai-lab/MuxServe)]
+    * CUHK & Shanghai AI Lab & HUST & SJTU & PKU & UC Berkeley & UCSD
+    * Colocate LLMs considering their popularity to multiplex memory resources.
+  * APIServe: Efficient API Support for Large-Language Model Inferencing \[[arXiv](https://arxiv.org/abs/2402.01869)]
+    * UCSD
+* Benchmark
+  * Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference \[[arXiv](https://www.google.com/url?sa=t\&source=web\&rct=j\&opi=89978449\&url=https://arxiv.org/abs/2403.04132\&ved=2ahUKEwinqvnbiruHAxWZmO4BHQAfAaMQFnoECAgQAQ\&usg=AOvVaw0xl2m0cvjY2iAKescRSm3P)] \[[Demo](https://chat.lmsys.org)]
+    * UC Berkeley
+* Speculative decoding
+  * Online Speculative Decoding \[[arXiv](https://arxiv.org/abs/2310.07177)]
+    * UC Berkeley & UCSD & Sisu Data & SJTU
 * Video generation
   * VideoPoet: A Large Language Model for Zero-Shot Video Generation \[[Paper](https://proceedings.mlr.press/v235/kondratyuk24a.html)] \[[Homepage](https://sites.research.google/videopoet/)]
     * Google & CMU