GITBOOK-190: Create new paper lists to organize AI papers (diffusion …

…models, language models, DLRMs)
mental2008 · Aug 17, 2024 · 774a0bd · 774a0bd
1 parent bc236ef
commit 774a0bd
Show file tree

Hide file tree

Showing 8 changed files with 117 additions and 30 deletions.
diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@ Specifically, I have a broad interest in systems (e.g., OSDI, SOSP, NSDI, ATC, E
 
 ## Changelogs
 
-* 08/2024: Update the reading notes of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md).
+* 08/2024: Update the reading notes of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md); create a new paper list of [diffusion models](paper-list/artificial-intelligence/diffusion-models.md), [language models](paper-list/artificial-intelligence/language-models.md), and [deep learning recommendation models](paper-list/artificial-intelligence/dlrm.md).
 * 07/2024: Organize the papers of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md), [ICML 2024](reading-notes/conference/icml-2024.md), [ATC 2024](reading-notes/conference/atc-2024.md), [OSDI 2024](reading-notes/conference/osdi-2024.md), [NSDI 2024](reading-notes/conference/nsdi-2024.md), [CVPR 2024](reading-notes/conference/cvpr-2024.md), [ISCA 2024](reading-notes/conference/isca-2024.md); create a new paper list of [Systems for diffusion models](paper-list/systems-for-ml/diffusion-models.md); update the paper list of [Systems for LLMs](paper-list/systems-for-ml/llm.md), [Systems for DLRMs](paper-list/systems-for-ml/dlrm.md), [Resource Scheduler](paper-list/systems-for-ml/resource-scheduler.md).
 
 ## Epilogue

diff --git a/SUMMARY.md b/SUMMARY.md
@@ -19,6 +19,10 @@
   * [Deep Learning Framework](paper-list/systems-for-ml/deep-learning-framework.md)
   * [Cloud-Edge Collaboration](paper-list/systems-for-ml/cloud-edge-collaboration.md)
 * [ML for Systems](paper-list/ml-for-systems.md)
+* [Artificial Intelligence (AI)](paper-list/artificial-intelligence/README.md)
+  * [Diffusion Models](paper-list/artificial-intelligence/diffusion-models.md)
+  * [Language Models](paper-list/artificial-intelligence/language-models.md)
+  * [Deep Learning Recommendation Model (DLRM)](paper-list/artificial-intelligence/dlrm.md)
 * [Hardware Virtualization](paper-list/hardware-virtualization/README.md)
   * [GPU Sharing](paper-list/hardware-virtualization/gpu-sharing.md)
 * [Resource Disaggregation](paper-list/resource-disaggregation/README.md)

diff --git a/paper-list/artificial-intelligence/README.md b/paper-list/artificial-intelligence/README.md
@@ -0,0 +1,5 @@
+# Artificial Intelligence (AI)
+
+* [Diffusion Models](diffusion-models.md)
+* [Language Models](language-models.md)
+* [Deep Learning Recommendation Models](dlrm.md)
diff --git a/paper-list/artificial-intelligence/diffusion-models.md b/paper-list/artificial-intelligence/diffusion-models.md
@@ -0,0 +1,60 @@
+# Diffusion Models
+
+## Image Generation
+
+### Diffusion Transformer (DiT)
+
+* FLUX.1 \[[Code](https://github.com/black-forest-labs/flux)]
+  * Black Forest Labs
+  * Text-to-image generation
+  * Models
+    * FLUX.1-dev: [https://huggingface.co/black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)
+    * FLUX.1-schnell: [https://huggingface.co/black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell)
+* Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (arXiv:2403.03206) \[[arXiv](https://arxiv.org/abs/2403.03206)] \[[Blog](https://stability.ai/news/stable-diffusion-3)]
+  * Stability AI
+  * **Stable Diffusion 3 (SD3)**
+  * Multimodal Diffusion Transformer (MMDiT)
+  * Models
+    * Stable Diffusion 3 Medium: [https://huggingface.co/stabilityai/stable-diffusion-3-medium](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
+* Scalable Diffusion Models with Transformers (ICCV 2023) \[[arXiv](https://arxiv.org/abs/2212.09748)] \[[Paper](https://openaccess.thecvf.com/content/ICCV2023/html/Peebles\_Scalable\_Diffusion\_Models\_with\_Transformers\_ICCV\_2023\_paper.html)] \[[Code](https://github.com/facebookresearch/DiT)] \[[Homepage](https://www.wpeebles.com/DiT)]
+  * UC Berkeley & NYU
+  * **DiT**
+
+### UNet
+
+* Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis \[[Technical Report](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors\_paper.pdf)]
+  * Kuaishou Kolors
+  * Text-to-image generation
+  * Model: [https://huggingface.co/Kwai-Kolors/Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
+* SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis (arXiv:2307.01952) \[[arXiv](https://arxiv.org/abs/2307.01952)]
+  * Stability AI
+  * Models
+    * [https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
+    * [https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0)
+* High-Resolution Image Synthesis with Latent Diffusion Models (CVPR 2022) \[[Paper](https://openaccess.thecvf.com/content/CVPR2022/html/Rombach\_High-Resolution\_Image\_Synthesis\_With\_Latent\_Diffusion\_Models\_CVPR\_2022\_paper)] \[[arXiv](https://arxiv.org/abs/2112.10752)] \[[Code](https://github.com/CompVis/stable-diffusion)]
+  * LMU Munich & Runway ML
+  * Latent Diffusion Models (LDMs)
+  * Models
+    * Stable-Diffusion-v1-5: [https://huggingface.co/runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)
+      * Initialized with the weights of the **Stable-Diffusion-v1-2** checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512.
+
+## Video Generation
+
+* Stable Video 4D (SV4D)
+  * Stability AI
+  * Model: [https://huggingface.co/stabilityai/sv4d](https://huggingface.co/stabilityai/sv4d)
+    * Generate **40** frames (5 video frames x 8 camera views) at 576x576 resolution, given 5 reference frames of the same size.
+* Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets (arXiv:2311.15127) \[[arXiv](https://arxiv.org/abs/2311.15127)] \[[Blog](https://stability.ai/news/stable-video-diffusion-open-ai-video-model)]
+  * Stability AI
+  * **Stable Video Diffusion** (SVD)
+  * Text-to-video and image-to-video generation
+  * Models
+    * [https://huggingface.co/stabilityai/stable-video-diffusion-img2vid](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid)
+      * Generate **14** frames at resolution **576x1024** given a context frame of the same size.
+    * [https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)
+      * Fine-tuned from the SVD-img2vid.
+      * Generate **25** frames at resolution **576x1024** given a context frame of the same size.
+
+## Acronyms
+
+* LLM: Large Language Model
diff --git a/paper-list/artificial-intelligence/dlrm.md b/paper-list/artificial-intelligence/dlrm.md
@@ -0,0 +1,10 @@
+# Deep Learning Recommendation Model (DLRM)
+
+* Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction (DLP-KDD 2022) \[[Paper](https://arxiv.org/abs/2209.12212)]
+  * Alibaba
+  * ETA: _Efficient target attention_ network
+  * Locality-sensitive hashing
+  * Deployed on Taobao_._
+* Wide & Deep Learning for Recommender Systems (DLRS 2016) \[[Personal Notes](../../reading-notes/miscellaneous/arxiv/2016/wide-and-deep-learning-for-recommender-systems.md)] \[[Paper](https://dl.acm.org/doi/10.1145/2988450.2988454)]
+  * Google
+  * WDL: Wide & Deep model
diff --git a/paper-list/artificial-intelligence/language-models.md b/paper-list/artificial-intelligence/language-models.md
@@ -0,0 +1,37 @@
+# Language Models
+
+* Grok-2 \[[Blog](https://x.ai/blog/grok-2)]
+  * xAI
+  * Grok-2 Beta was released on 2024/08/13.
+* Gemma 2: Improving Open Language Models at a Practical Size (arXiv:2408.00118) \[[arXiv](https://arxiv.org/abs/2408.00118)] \[[Code](https://github.com/google-deepmind/gemma)]
+  * Gemma Team, Google DeepMind
+  * **Gemma 2**
+  * Models: [https://www.kaggle.com/models/google/gemma](https://www.kaggle.com/models/google/gemma)
+* The Llama 3 Herd of Models (arXiv:2407.21783) \[[arXiv](https://arxiv.org/abs/2407.21783)] \[[Blog](https://ai.meta.com/blog/meta-llama-3/)] \[[Code](https://github.com/meta-llama/llama3)]
+  * &#x20;MetaAI
+  * **Llama 3**
+  * Models
+    * Llama 3 8B: [https://huggingface.co/meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)
+    * Llama 3 70B
+    * Llama 3 405B
+* Mixtral 8x7B (arXiv:2401.04088) \[[arXiv](https://arxiv.org/abs/2401.04088)] \[[Blog](https://mistral.ai/news/mixtral-of-experts/)] \[[Code](https://github.com/mistralai/mistral-inference)]
+  * Mistral AI
+  * **Mixtral 8x7B**
+  * Model: [https://huggingface.co/mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)
+* Llama 2: Open Foundation and Fine-Tuned Chat Models (arXiv 2307.09288) \[[Paper](https://arxiv.org/abs/2307.09288)] \[[Homepage](https://ai.meta.com/llama/)]
+  * Meta AI
+  * **Llama 2**
+  * Released with a _permissive_ community license and is available for commercial use.
+* LLaMA: Open and Efficient Foundation Language Models (arXiv 2302.13971) \[[Paper](https://arxiv.org/abs/2302.13971)] \[[Code](https://github.com/facebookresearch/llama)]
+  * Meta AI
+  * **6.7B, 13B, 32.5B, 65.2B**
+  * Open-access
+* PaLM: Scaling Language Modeling with Pathways (JMLR 2023) \[[Paper](https://www.jmlr.org/papers/v24/22-1144.html)] \[[PaLM API](https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html)]
+  * **540B**; open access to PaLM APIs in March 2023.
+* BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (arXiv 2211.05100) \[[Paper](https://arxiv.org/abs/2211.05100)] \[[Model](https://huggingface.co/bigscience/bloom)] \[[Blog](https://bigscience.huggingface.co/blog/bloom)]
+  * **176B**
+  * open-access
+* OPT: Open Pre-trained Transformer Language Models (arXiv: 2205.01068) \[[Paper](https://arxiv.org/abs/2205.01068)] \[[Code](https://github.com/facebookresearch/metaseq/tree/main/projects/OPT)]
+  * Meta AI
+  * Range from 125M to 175B parameters.
+  * Open-access
diff --git a/paper-list/systems-for-ml/dlrm.md b/paper-list/systems-for-ml/dlrm.md
@@ -41,17 +41,6 @@
   * Tencent & Edinburgh
   * P2P model update dissemination.
 
-## DLRM
-
-* Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction (DLP-KDD 2022) \[[Paper](https://arxiv.org/abs/2209.12212)]
-  * Alibaba
-  * ETA: _Efficient target attention_ network
-  * Locality-sensitive hashing
-  * Deployed on Taobao_._
-* Wide & Deep Learning for Recommender Systems (DLRS 2016) \[[Personal Notes](../../reading-notes/miscellaneous/arxiv/2016/wide-and-deep-learning-for-recommender-systems.md)] \[[Paper](https://dl.acm.org/doi/10.1145/2988450.2988454)]
-  * Google
-  * WDL: Wide & Deep model
-
 ## Acronyms
 
 * DLRM: Deep Learning Recommendation Model
diff --git a/paper-list/systems-for-ml/llm.md b/paper-list/systems-for-ml/llm.md
@@ -145,24 +145,6 @@ I am actively maintaining this list.
 * PUZZLE: Efficiently Aligning Large Language Models through Light-Weight Context Switch ([ATC 2024](../../reading-notes/conference/atc-2024.md)) \[[Paper](https://www.usenix.org/conference/atc24/presentation/lei)]
   * THU
 
-## LLMs
-
-* Llama 2: Open Foundation and Fine-Tuned Chat Models (arXiv 2307.09288) \[[Paper](https://arxiv.org/abs/2307.09288)] \[[Homepage](https://ai.meta.com/llama/)]
-  * Released with a _permissive_ community license and is available for commercial use.
-* LLaMA: Open and Efficient Foundation Language Models (arXiv 2302.13971) \[[Paper](https://arxiv.org/abs/2302.13971)] \[[Code](https://github.com/facebookresearch/llama)]
-  * Meta AI
-  * **6.7B, 13B, 32.5B, 65.2B**
-  * Open-access
-* PaLM: Scaling Language Modeling with Pathways (JMLR 2023) \[[Paper](https://www.jmlr.org/papers/v24/22-1144.html)] \[[PaLM API](https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html)]
-  * **540B**; open access to PaLM APIs in March 2023.
-* BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (arXiv 2211.05100) \[[Paper](https://arxiv.org/abs/2211.05100)] \[[Model](https://huggingface.co/bigscience/bloom)] \[[Blog](https://bigscience.huggingface.co/blog/bloom)]
-  * **176B**
-  * open-access
-* OPT: Open Pre-trained Transformer Language Models (arXiv: 2205.01068) \[[Paper](https://arxiv.org/abs/2205.01068)] \[[Code](https://github.com/facebookresearch/metaseq/tree/main/projects/OPT)]
-  * Meta AI
-  * Range from 125M to 175B parameters.
-  * Open-access
-
 ## Acronyms
 
 * LLM: Large Language Model