Skip to content

Commit

Permalink
GITBOOK-184: Update the paper lists: Systems for LLMs, Model Serving,…
Browse files Browse the repository at this point in the history
… ICML '24
  • Loading branch information
mental2008 authored and gitbook-bot committed Jul 22, 2024
1 parent b4f2f55 commit ca7d867
Show file tree
Hide file tree
Showing 3 changed files with 35 additions and 12 deletions.
18 changes: 15 additions & 3 deletions paper-list/systems-for-ml/llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,23 @@ I am actively maintaining this list.

## LLM Training

### Hybrid parallelism

* Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism ([ATC 2024](../../reading-notes/conference/atc-2024.md)) \[[Paper](https://www.usenix.org/conference/atc24/presentation/yuan)] \[[Code](https://github.com/kwai/Megatron-Kwai/tree/atc24ae/examples/atc24)]
* Kuaishou
* Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning ([OSDI 2022](../../reading-notes/conference/osdi-2022/)) \[[Paper](https://www.usenix.org/conference/osdi22/presentation/zheng-lianmin)] \[[Code](https://github.com/alpa-projects/alpa)] \[[Docs](https://alpa.ai/)]
* UC Berkeley & AWS & Google & SJTU & CMU & Duke
* Generalize the search through _parallelism strategies_.

### Fault tolerance

* Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates ([SOSP 2023](../../reading-notes/conference/sosp-2023/)) \[[Paper](https://dl.acm.org/doi/abs/10.1145/3600006.3613152)] \[[arXiv](https://browse.arxiv.org/abs/2309.08125)] \[[Code](https://github.com/SymbioticLab/Oobleck)]
* UMich SymbioticLab & AWS & PKU
* Gemini: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints ([SOSP 2023](../../reading-notes/conference/sosp-2023/)) \[[Paper](https://dl.acm.org/doi/10.1145/3600006.3613145)]
* Rice & AWS
* Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs ([NSDI 2023](../../reading-notes/conference/nsdi-2023/)) \[[Paper](https://www.usenix.org/conference/nsdi23/presentation/thorpe)] \[[Code](https://github.com/uclasystem/bamboo)]
* UCLA & CMU & MSR & Princeton
* Resilient distributed training
* Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning ([OSDI 2022](../../reading-notes/conference/osdi-2022/)) \[[Paper](https://www.usenix.org/conference/osdi22/presentation/zheng-lianmin)] \[[Code](https://github.com/alpa-projects/alpa)] \[[Docs](https://alpa.ai/)]
* UC Berkeley & AWS & Google & SJTU & CMU & Duke
* Generalize the search through _parallelism strategies_.

## LLM Inference

Expand Down Expand Up @@ -131,6 +138,11 @@ I am actively maintaining this list.
* Fairness in Serving Large Language Models ([OSDI 2024](../../reading-notes/conference/osdi-2024.md)) \[[Paper](https://www.usenix.org/conference/osdi24/presentation/sheng)] \[[Code](https://github.com/Ying1123/VTC-artifact)]
* UC Berkeley

## LLM Alignment

* PUZZLE: Efficiently Aligning Large Language Models through Light-Weight Context Switch ([ATC 2024](../../reading-notes/conference/atc-2024.md)) \[[Paper](https://www.usenix.org/conference/atc24/presentation/lei)]
* THU

## LLMs

* Llama 2: Open Foundation and Fine-Tuned Chat Models (arXiv 2307.09288) \[[Paper](https://arxiv.org/abs/2307.09288)] \[[Homepage](https://ai.meta.com/llama/)]
Expand Down
2 changes: 2 additions & 0 deletions paper-list/systems-for-ml/model-serving.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ I am actively maintaining this list.

## Model Serving Systems

* Usher: Holistic Interference Avoidance for Resource Optimized ML Inference ([OSDI 2024](../../reading-notes/conference/osdi-2024.md)) \[[Paper](https://www.usenix.org/conference/osdi24/presentation/shubha)] \[[Code](https://github.com/ss7krd/Usher)]
* UVA & GaTech
* Paella: Low-latency Model Serving with Software-defined GPU Scheduling ([SOSP 2023](../../reading-notes/conference/sosp-2023/)) \[[Paper](https://dl.acm.org/doi/10.1145/3600006.3613163)]
* UPenn & DBOS, Inc.
* Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences ([OSDI 2022](../../reading-notes/conference/osdi-2022/)) \[[Personal Notes](../../reading-notes/conference/osdi-2022/reef.md)] \[[Paper](https://www.usenix.org/conference/osdi22/presentation/han)] \[[Code](https://github.com/SJTU-IPADS/reef)] \[[Benchmark](https://github.com/SJTU-IPADS/disb)] \[[Artifact](https://github.com/SJTU-IPADS/reef-artifacts/tree/osdi22-ae)]
Expand Down
27 changes: 18 additions & 9 deletions reading-notes/conference/icml-2024.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,24 @@ Homepage: [https://icml.cc/Conferences/2024](https://icml.cc/Conferences/2024)

### Papers

### Serving Large Language Models (LLMs)

* HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment \[[Personal Notes](../miscellaneous/arxiv/2023/hexgen.md)] \[[arXiv](https://arxiv.org/abs/2311.11514)] \[[Code](https://github.com/Relaxed-System-Lab/HexGen)]
* HKUST & ETH & CMU
* Support _asymmetric_ tensor model parallelism and pipeline parallelism under the _heterogeneous_ setting (i.e., each pipeline parallel stage can be assigned with a different number of layers and tensor model parallel degree).
* Propose _a heuristic-based evolutionary algorithm_ to search for the optimal layout.

### Multimodality

### Large Language Models (LLMs)

* Serving LLMs
* HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment \[[Personal Notes](../miscellaneous/arxiv/2023/hexgen.md)] \[[arXiv](https://arxiv.org/abs/2311.11514)] \[[Code](https://github.com/Relaxed-System-Lab/HexGen)]
* HKUST & ETH & CMU
* Support _asymmetric_ tensor model parallelism and pipeline parallelism under the _heterogeneous_ setting (i.e., each pipeline parallel stage can be assigned with a different number of layers and tensor model parallel degree).
* Propose _a heuristic-based evolutionary algorithm_ to search for the optimal layout.
* MuxServe: Flexible Spatial-Temporal Multiplexing for LLM Serving \[[arXiv](https://arxiv.org/abs/2404.02015)] \[[Code](https://github.com/hao-ai-lab/MuxServe)]
* CUHK & Shanghai AI Lab & HUST & SJTU & PKU & UC Berkeley & UCSD
* Colocate LLMs considering their popularity to multiplex memory resources.
* APIServe: Efficient API Support for Large-Language Model Inferencing \[[arXiv](https://arxiv.org/abs/2402.01869)]
* UCSD
* Benchmark
* Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference \[[arXiv](https://www.google.com/url?sa=t\&source=web\&rct=j\&opi=89978449\&url=https://arxiv.org/abs/2403.04132\&ved=2ahUKEwinqvnbiruHAxWZmO4BHQAfAaMQFnoECAgQAQ\&usg=AOvVaw0xl2m0cvjY2iAKescRSm3P)] \[[Demo](https://chat.lmsys.org)]
* UC Berkeley
* Speculative decoding
* Online Speculative Decoding \[[arXiv](https://arxiv.org/abs/2310.07177)]
* UC Berkeley & UCSD & Sisu Data & SJTU
* Video generation
* VideoPoet: A Large Language Model for Zero-Shot Video Generation \[[Paper](https://proceedings.mlr.press/v235/kondratyuk24a.html)] \[[Homepage](https://sites.research.google/videopoet/)]
* Google & CMU
Expand Down

0 comments on commit ca7d867

Please sign in to comment.