GITBOOK-188: Update the reading notes of SIGCOMM '24

mental2008 · Aug 9, 2024 · 39d5a2b · 39d5a2b
1 parent fdcbcb8
commit 39d5a2b
Show file tree

Hide file tree

Showing 4 changed files with 42 additions and 16 deletions.
diff --git a/README.md b/README.md
@@ -18,6 +18,7 @@ Specifically, I have a broad interest in systems (e.g., OSDI, SOSP, NSDI, ATC, E
 
 ## Changelogs
 
+* 08/2024: Update the reading notes of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md).
 * 07/2024: Organize the papers of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md), [ICML 2024](reading-notes/conference/icml-2024.md), [ATC 2024](reading-notes/conference/atc-2024.md), [OSDI 2024](reading-notes/conference/osdi-2024.md), [NSDI 2024](reading-notes/conference/nsdi-2024.md), [CVPR 2024](reading-notes/conference/cvpr-2024.md), [ISCA 2024](reading-notes/conference/isca-2024.md); create a new paper list of [Systems for diffusion models](paper-list/systems-for-ml/diffusion-models.md); update the paper list of [Systems for LLMs](paper-list/systems-for-ml/llm.md), [Systems for DLRMs](paper-list/systems-for-ml/dlrm.md), [Resource Scheduler](paper-list/systems-for-ml/resource-scheduler.md).
 
 ## Epilogue

diff --git a/paper-list/systems-for-ml/llm.md b/paper-list/systems-for-ml/llm.md
@@ -26,7 +26,7 @@ I am actively maintaining this list.
 
 ## LLM Inference
 
-* CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving ([SIGCOMM 2024](../../reading-notes/conference/sigcomm-2024.md)) \[[arXiv](https://arxiv.org/abs/2310.07240)] \[[Code](https://github.com/UChi-JCL/CacheGen)]
+* CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving ([SIGCOMM 2024](../../reading-notes/conference/sigcomm-2024.md)) \[[arXiv](https://arxiv.org/abs/2310.07240)] \[[Code](https://github.com/UChi-JCL/CacheGen)] \[[Video](https://www.youtube.com/watch?v=H4\_OUWvdiNo)]
   * UChicago & Microsoft & Stanford
 * Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve ([OSDI 2024](../../reading-notes/conference/osdi-2024.md)) \[[Paper](https://www.usenix.org/conference/osdi24/presentation/agrawal)] \[[Code](https://github.com/microsoft/sarathi-serve)] \[[arXiv](https://arxiv.org/abs/2403.02310)]
   * MSR India & GaTech

diff --git a/reading-notes/conference/README.md b/reading-notes/conference/README.md
@@ -7,8 +7,8 @@
 |            SoCC 2024            |   Nov 22-24, 2024  | Seattle, Washington, USA                               |                  **Upcoming**                 |
 |             SC 2024             |   Nov 17-22, 2024  | Atlanta, GA, USA                                       |                  **Upcoming**                 |
 |            SOSP 2024            |    Nov 4-6, 2024   | Hilton Austin, Texas, USA                              |                  **Upcoming**                 |
-| [SIGCOMM 2024](sigcomm-2024.md) |    Aug 4-8, 2024   | Sydney, Australia                                      |                  **Upcoming**                 |
-|    [ICML 2024](icml-2024.md)    |   Jul 21-27, 2024  | Messe Wien Exhibition Congress Center, Vienna, Austria |                 👀**Ongoing!**                |
+| [SIGCOMM 2024](sigcomm-2024.md) |    Aug 4-8, 2024   | Sydney, Australia                                      |                       🧐                      |
+|    [ICML 2024](icml-2024.md)    |   Jul 21-27, 2024  | Messe Wien Exhibition Congress Center, Vienna, Austria |                                               |
 |     [ATC 2024](atc-2024.md)     |   Jul 10-12, 2024  | Santa Clara, CA, USA                                   | 🧐; co-located with [OSDI 2024](osdi-2024.md) |
 |    [OSDI 2024](osdi-2024.md)    |   Jul 10-12, 2024  | Santa Clara, CA, USA                                   |  🧐; co-located with [ATC 2024](atc-2024.md)  |
 |    [ISCA 2024](isca-2024.md)    | Jun 29-Jul 3, 2024 | Buenos Aires, Argentina                                |                       🧐                      |

diff --git a/reading-notes/conference/sigcomm-2024.md b/reading-notes/conference/sigcomm-2024.md
@@ -4,43 +4,68 @@
 
 Homepage: [https://conferences.sigcomm.org/sigcomm/2024/](https://conferences.sigcomm.org/sigcomm/2024/)
 
-Paper list: [https://conferences.sigcomm.org/sigcomm/2024/program/](https://conferences.sigcomm.org/sigcomm/2024/program/)
+### Paper list
+
+* [https://conferences.sigcomm.org/sigcomm/2024/program/](https://conferences.sigcomm.org/sigcomm/2024/program/)
+* [https://dl.acm.org/doi/proceedings/10.1145/3651890](https://dl.acm.org/doi/proceedings/10.1145/3651890)
 
 ## Papers
 
 ### Large Language Models (LLMs)
 
 * Systems/Networking for LLM
-  * CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving \[[arXiv](https://arxiv.org/abs/2310.07240)] \[[Code](https://github.com/UChi-JCL/CacheGen)]
+  * CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving \[[Paper](https://dl.acm.org/doi/10.1145/3651890.3672274)] \[[arXiv](https://arxiv.org/abs/2310.07240)] \[[Code](https://github.com/UChi-JCL/CacheGen)] \[[Video](https://www.youtube.com/watch?v=H4\_OUWvdiNo)]
     * UChicago & Microsoft & Stanford
+    * **CacheGen**: A context-loading module for LLM systems.
       * Use a custom tensor encoder to encode a KV cache into more compact bitstream representations with negligible decoding overhead.
       * Adapt the compression level of different parts of a KV cache to cope with changes in available bandwidth.
-      * Focus on reducing the network delay in fetching the KV cache. → TTFT reduction.
-  * Alibaba HPN: A Data Center Network for Large Language Model Training
+    * Objective: Focus on reducing the network delay in fetching the KV cache → TTFT reduction.
+  * Alibaba HPN: A Data Center Network for Large Language Model Training \[[Paper](https://doi.org/10.1145/3651890.3672265)] \[[Video](https://www.youtube.com/watch?v=s-3VLs9sd10)]
     * Alibaba Cloud
     * Experience Track
+    * LLM training's characteristics
+      * Produce a small number of periodic, bursty flows (e.g., 400Gbps) on each host.
+      * Require GPUs to complete iterations in synchronization; more sensitive to single-point failure.
+    * Alibaba High-Performance Network (**HPN**): Introduce a 2-tier, dual-plane architecture capable of interconnecting 15K GPUs within one Pod.
+      * Benefits: eliminate hash polarization; simplify the optimal path selections.
+  * RDMA over Ethernet for Distributed Training at Meta Scale \[[Paper](https://dl.acm.org/doi/10.1145/3651890.3672233)] \[[Blog](https://engineering.fb.com/2024/03/12/data-center-engineering/building-metas-genai-infrastructure/)]
+    * Meta
+    * Experience Track
+    * Deploy a combination of centralized traffic engineering and an Enhanced ECMP (Equal-Cost Multi-Path) scheme to achieve optimal load distribution for training workloads.
+    * Design a receiver-driven traffic admission via the collective library -> Co-tune both the collective library configuration and the underlying network configuration.
 * LLMs for Networking
-  * NetLLM: Adapting Large Language Models for Networking
+  * NetLLM: Adapting Large Language Models for Networking \[[Paper](https://dl.acm.org/doi/10.1145/3651890.3672268)]
     * CUHK-Shenzhen & Tsinghua SIGS & UChicago
+    * **NetLLM**: Empower the LLM to process multimodal data in networking and generate task-specific answers.
+    * Study three networking-related use cases: viewport prediction, adaptive bitrate streaming, and cluster job scheduling.
 
 ### Distributed Training
 
-* Crux: GPU-Efficient Communication Scheduling for Deep Learning Training \[[Dataset](https://github.com/alibaba/alibaba-lingjun-dataset-2023)]
+* Crux: GPU-Efficient Communication Scheduling for Deep Learning Training \[[Paper](https://dl.acm.org/doi/10.1145/3651890.3672239)] \[[Dataset](https://github.com/alibaba/alibaba-lingjun-dataset-2023)]
   * Alibaba Cloud
-* RDMA over Ethernet for Distributed Training at Meta Scale
-  * Meta
-  * Experience Track
-* Accelerating Model Training in Multi-cluster Environments with Consumer-grade GPUs
+  * Observation: Communication contention among different deep learning training (DLT) jobs seriously influences the overall GPU computation utilization -> Low efficiency of the training cluster.
+  * **Crux**: A communication scheduler
+    * Objective: Mitigate the communication contention among DLT jobs -> Maximize GPU computation utilization.
+    * Designs: reduce the GPU utilization problem to a flow optimization problem; GPU intensity-aware communication scheduling; prioritize the DLT flows with high GPU computation intensity.
+* Accelerating Model Training in Multi-cluster Environments with Consumer-grade GPUs \[[Paper](https://dl.acm.org/doi/10.1145/3651890.3672228)]
   * KAIST & UC Irvine & VMware Research
+  * Cache-aware gradient compression; a CPU-based sparse optimizer.
+  * Adapt training configurations to fluctuating dynamic network bandwidth -> Enable co-training using on-premises and cloud clusters.
 
 ### Data Processing
 
-* Turbo: Efficient Communication Framework for Large-scale Data Processing Cluster
+* Turbo: Efficient Communication Framework for Large-scale Data Processing Cluster \[[Paper](https://dl.acm.org/doi/10.1145/3651890.3672241)]
   * Tencent & FDU & NVIDIA & THU
-    * Experience Track
+  * Experience Track
+  * Network throughput & scalability: A dynamic block-level flowlet transmission mechanism; a non-blocking communication middleware.
+  * System reliability: Utilize an external shuffle service as well as TCP serving as a backup.
+  * Integrated into Apache Spark.
 
 ### Data Transfers
 
-* An exabyte a day: Throughput-oriented, Large-scale, Managed Data Transfers with Effingo
+* An exabyte a day: Throughput-oriented, Large-scale, Managed Data Transfers with Effingo \[[Paper](https://dl.acm.org/doi/10.1145/3651890.3672262)]
   * Google
   * Experience Track
+  * **Effingo**: A copy system, integrated with resource management and authorization systems.
+    * Per-cluster deployments -> Limit failure domains to individual clusters.
+    * Separation from the bandwidth management layer (BwE) -> A modular design that reduces dependencies.