Skip to content

Commit

Permalink
GITBOOK-188: Update the reading notes of SIGCOMM '24
Browse files Browse the repository at this point in the history
  • Loading branch information
mental2008 authored and gitbook-bot committed Aug 9, 2024
1 parent fdcbcb8 commit 39d5a2b
Show file tree
Hide file tree
Showing 4 changed files with 42 additions and 16 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Specifically, I have a broad interest in systems (e.g., OSDI, SOSP, NSDI, ATC, E

## Changelogs

* 08/2024: Update the reading notes of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md).
* 07/2024: Organize the papers of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md), [ICML 2024](reading-notes/conference/icml-2024.md), [ATC 2024](reading-notes/conference/atc-2024.md), [OSDI 2024](reading-notes/conference/osdi-2024.md), [NSDI 2024](reading-notes/conference/nsdi-2024.md), [CVPR 2024](reading-notes/conference/cvpr-2024.md), [ISCA 2024](reading-notes/conference/isca-2024.md); create a new paper list of [Systems for diffusion models](paper-list/systems-for-ml/diffusion-models.md); update the paper list of [Systems for LLMs](paper-list/systems-for-ml/llm.md), [Systems for DLRMs](paper-list/systems-for-ml/dlrm.md), [Resource Scheduler](paper-list/systems-for-ml/resource-scheduler.md).

## Epilogue
Expand Down
2 changes: 1 addition & 1 deletion paper-list/systems-for-ml/llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ I am actively maintaining this list.

## LLM Inference

* CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving ([SIGCOMM 2024](../../reading-notes/conference/sigcomm-2024.md)) \[[arXiv](https://arxiv.org/abs/2310.07240)] \[[Code](https://github.com/UChi-JCL/CacheGen)]
* CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving ([SIGCOMM 2024](../../reading-notes/conference/sigcomm-2024.md)) \[[arXiv](https://arxiv.org/abs/2310.07240)] \[[Code](https://github.com/UChi-JCL/CacheGen)] \[[Video](https://www.youtube.com/watch?v=H4\_OUWvdiNo)]
* UChicago & Microsoft & Stanford
* Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve ([OSDI 2024](../../reading-notes/conference/osdi-2024.md)) \[[Paper](https://www.usenix.org/conference/osdi24/presentation/agrawal)] \[[Code](https://github.com/microsoft/sarathi-serve)] \[[arXiv](https://arxiv.org/abs/2403.02310)]
* MSR India & GaTech
Expand Down
4 changes: 2 additions & 2 deletions reading-notes/conference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
| SoCC 2024 | Nov 22-24, 2024 | Seattle, Washington, USA | **Upcoming** |
| SC 2024 | Nov 17-22, 2024 | Atlanta, GA, USA | **Upcoming** |
| SOSP 2024 | Nov 4-6, 2024 | Hilton Austin, Texas, USA | **Upcoming** |
| [SIGCOMM 2024](sigcomm-2024.md) | Aug 4-8, 2024 | Sydney, Australia | **Upcoming** |
| [ICML 2024](icml-2024.md) | Jul 21-27, 2024 | Messe Wien Exhibition Congress Center, Vienna, Austria | 👀**Ongoing!** |
| [SIGCOMM 2024](sigcomm-2024.md) | Aug 4-8, 2024 | Sydney, Australia | 🧐 |
| [ICML 2024](icml-2024.md) | Jul 21-27, 2024 | Messe Wien Exhibition Congress Center, Vienna, Austria | |
| [ATC 2024](atc-2024.md) | Jul 10-12, 2024 | Santa Clara, CA, USA | 🧐; co-located with [OSDI 2024](osdi-2024.md) |
| [OSDI 2024](osdi-2024.md) | Jul 10-12, 2024 | Santa Clara, CA, USA | 🧐; co-located with [ATC 2024](atc-2024.md) |
| [ISCA 2024](isca-2024.md) | Jun 29-Jul 3, 2024 | Buenos Aires, Argentina | 🧐 |
Expand Down
51 changes: 38 additions & 13 deletions reading-notes/conference/sigcomm-2024.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,43 +4,68 @@

Homepage: [https://conferences.sigcomm.org/sigcomm/2024/](https://conferences.sigcomm.org/sigcomm/2024/)

Paper list: [https://conferences.sigcomm.org/sigcomm/2024/program/](https://conferences.sigcomm.org/sigcomm/2024/program/)
### Paper list

* [https://conferences.sigcomm.org/sigcomm/2024/program/](https://conferences.sigcomm.org/sigcomm/2024/program/)
* [https://dl.acm.org/doi/proceedings/10.1145/3651890](https://dl.acm.org/doi/proceedings/10.1145/3651890)

## Papers

### Large Language Models (LLMs)

* Systems/Networking for LLM
* CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving \[[arXiv](https://arxiv.org/abs/2310.07240)] \[[Code](https://github.com/UChi-JCL/CacheGen)]
* CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving \[[Paper](https://dl.acm.org/doi/10.1145/3651890.3672274)] \[[arXiv](https://arxiv.org/abs/2310.07240)] \[[Code](https://github.com/UChi-JCL/CacheGen)] \[[Video](https://www.youtube.com/watch?v=H4\_OUWvdiNo)]
* UChicago & Microsoft & Stanford
* **CacheGen**: A context-loading module for LLM systems.
* Use a custom tensor encoder to encode a KV cache into more compact bitstream representations with negligible decoding overhead.
* Adapt the compression level of different parts of a KV cache to cope with changes in available bandwidth.
* Focus on reducing the network delay in fetching the KV cache. → TTFT reduction.
* Alibaba HPN: A Data Center Network for Large Language Model Training
* Objective: Focus on reducing the network delay in fetching the KV cache → TTFT reduction.
* Alibaba HPN: A Data Center Network for Large Language Model Training \[[Paper](https://doi.org/10.1145/3651890.3672265)] \[[Video](https://www.youtube.com/watch?v=s-3VLs9sd10)]
* Alibaba Cloud
* Experience Track
* LLM training's characteristics
* Produce a small number of periodic, bursty flows (e.g., 400Gbps) on each host.
* Require GPUs to complete iterations in synchronization; more sensitive to single-point failure.
* Alibaba High-Performance Network (**HPN**): Introduce a 2-tier, dual-plane architecture capable of interconnecting 15K GPUs within one Pod.
* Benefits: eliminate hash polarization; simplify the optimal path selections.
* RDMA over Ethernet for Distributed Training at Meta Scale \[[Paper](https://dl.acm.org/doi/10.1145/3651890.3672233)] \[[Blog](https://engineering.fb.com/2024/03/12/data-center-engineering/building-metas-genai-infrastructure/)]
* Meta
* Experience Track
* Deploy a combination of centralized traffic engineering and an Enhanced ECMP (Equal-Cost Multi-Path) scheme to achieve optimal load distribution for training workloads.
* Design a receiver-driven traffic admission via the collective library -> Co-tune both the collective library configuration and the underlying network configuration.
* LLMs for Networking
* NetLLM: Adapting Large Language Models for Networking
* NetLLM: Adapting Large Language Models for Networking \[[Paper](https://dl.acm.org/doi/10.1145/3651890.3672268)]
* CUHK-Shenzhen & Tsinghua SIGS & UChicago
* **NetLLM**: Empower the LLM to process multimodal data in networking and generate task-specific answers.
* Study three networking-related use cases: viewport prediction, adaptive bitrate streaming, and cluster job scheduling.

### Distributed Training

* Crux: GPU-Efficient Communication Scheduling for Deep Learning Training \[[Dataset](https://github.com/alibaba/alibaba-lingjun-dataset-2023)]
* Crux: GPU-Efficient Communication Scheduling for Deep Learning Training \[[Paper](https://dl.acm.org/doi/10.1145/3651890.3672239)] \[[Dataset](https://github.com/alibaba/alibaba-lingjun-dataset-2023)]
* Alibaba Cloud
* RDMA over Ethernet for Distributed Training at Meta Scale
* Meta
* Experience Track
* Accelerating Model Training in Multi-cluster Environments with Consumer-grade GPUs
* Observation: Communication contention among different deep learning training (DLT) jobs seriously influences the overall GPU computation utilization -> Low efficiency of the training cluster.
* **Crux**: A communication scheduler
* Objective: Mitigate the communication contention among DLT jobs -> Maximize GPU computation utilization.
* Designs: reduce the GPU utilization problem to a flow optimization problem; GPU intensity-aware communication scheduling; prioritize the DLT flows with high GPU computation intensity.
* Accelerating Model Training in Multi-cluster Environments with Consumer-grade GPUs \[[Paper](https://dl.acm.org/doi/10.1145/3651890.3672228)]
* KAIST & UC Irvine & VMware Research
* Cache-aware gradient compression; a CPU-based sparse optimizer.
* Adapt training configurations to fluctuating dynamic network bandwidth -> Enable co-training using on-premises and cloud clusters.

### Data Processing

* Turbo: Efficient Communication Framework for Large-scale Data Processing Cluster
* Turbo: Efficient Communication Framework for Large-scale Data Processing Cluster \[[Paper](https://dl.acm.org/doi/10.1145/3651890.3672241)]
* Tencent & FDU & NVIDIA & THU
* Experience Track
* Experience Track
* Network throughput & scalability: A dynamic block-level flowlet transmission mechanism; a non-blocking communication middleware.
* System reliability: Utilize an external shuffle service as well as TCP serving as a backup.
* Integrated into Apache Spark.

### Data Transfers

* An exabyte a day: Throughput-oriented, Large-scale, Managed Data Transfers with Effingo
* An exabyte a day: Throughput-oriented, Large-scale, Managed Data Transfers with Effingo \[[Paper](https://dl.acm.org/doi/10.1145/3651890.3672262)]
* Google
* Experience Track
* **Effingo**: A copy system, integrated with resource management and authorization systems.
* Per-cluster deployments -> Limit failure domains to individual clusters.
* Separation from the bandwidth management layer (BwE) -> A modular design that reduces dependencies.

0 comments on commit 39d5a2b

Please sign in to comment.