From 774a0bd67065183008b860dac4a3cff5b9281dbd Mon Sep 17 00:00:00 2001 From: Lingyun Yang Date: Sat, 17 Aug 2024 09:54:17 +0000 Subject: [PATCH] GITBOOK-190: Create new paper lists to organize AI papers (diffusion models, language models, DLRMs) --- README.md | 2 +- SUMMARY.md | 4 ++ paper-list/artificial-intelligence/README.md | 5 ++ .../diffusion-models.md | 60 +++++++++++++++++++ paper-list/artificial-intelligence/dlrm.md | 10 ++++ .../language-models.md | 37 ++++++++++++ paper-list/systems-for-ml/dlrm.md | 11 ---- paper-list/systems-for-ml/llm.md | 18 ------ 8 files changed, 117 insertions(+), 30 deletions(-) create mode 100644 paper-list/artificial-intelligence/README.md create mode 100644 paper-list/artificial-intelligence/diffusion-models.md create mode 100644 paper-list/artificial-intelligence/dlrm.md create mode 100644 paper-list/artificial-intelligence/language-models.md diff --git a/README.md b/README.md index 42cd7e3..e5c8f27 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ Specifically, I have a broad interest in systems (e.g., OSDI, SOSP, NSDI, ATC, E ## Changelogs -* 08/2024: Update the reading notes of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md). +* 08/2024: Update the reading notes of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md); create a new paper list of [diffusion models](paper-list/artificial-intelligence/diffusion-models.md), [language models](paper-list/artificial-intelligence/language-models.md), and [deep learning recommendation models](paper-list/artificial-intelligence/dlrm.md). * 07/2024: Organize the papers of [SIGCOMM 2024](reading-notes/conference/sigcomm-2024.md), [ICML 2024](reading-notes/conference/icml-2024.md), [ATC 2024](reading-notes/conference/atc-2024.md), [OSDI 2024](reading-notes/conference/osdi-2024.md), [NSDI 2024](reading-notes/conference/nsdi-2024.md), [CVPR 2024](reading-notes/conference/cvpr-2024.md), [ISCA 2024](reading-notes/conference/isca-2024.md); create a new paper list of [Systems for diffusion models](paper-list/systems-for-ml/diffusion-models.md); update the paper list of [Systems for LLMs](paper-list/systems-for-ml/llm.md), [Systems for DLRMs](paper-list/systems-for-ml/dlrm.md), [Resource Scheduler](paper-list/systems-for-ml/resource-scheduler.md). ## Epilogue diff --git a/SUMMARY.md b/SUMMARY.md index cc6bb29..bb81093 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -19,6 +19,10 @@ * [Deep Learning Framework](paper-list/systems-for-ml/deep-learning-framework.md) * [Cloud-Edge Collaboration](paper-list/systems-for-ml/cloud-edge-collaboration.md) * [ML for Systems](paper-list/ml-for-systems.md) +* [Artificial Intelligence (AI)](paper-list/artificial-intelligence/README.md) + * [Diffusion Models](paper-list/artificial-intelligence/diffusion-models.md) + * [Language Models](paper-list/artificial-intelligence/language-models.md) + * [Deep Learning Recommendation Model (DLRM)](paper-list/artificial-intelligence/dlrm.md) * [Hardware Virtualization](paper-list/hardware-virtualization/README.md) * [GPU Sharing](paper-list/hardware-virtualization/gpu-sharing.md) * [Resource Disaggregation](paper-list/resource-disaggregation/README.md) diff --git a/paper-list/artificial-intelligence/README.md b/paper-list/artificial-intelligence/README.md new file mode 100644 index 0000000..a605692 --- /dev/null +++ b/paper-list/artificial-intelligence/README.md @@ -0,0 +1,5 @@ +# Artificial Intelligence (AI) + +* [Diffusion Models](diffusion-models.md) +* [Language Models](language-models.md) +* [Deep Learning Recommendation Models](dlrm.md) diff --git a/paper-list/artificial-intelligence/diffusion-models.md b/paper-list/artificial-intelligence/diffusion-models.md new file mode 100644 index 0000000..c28eb5a --- /dev/null +++ b/paper-list/artificial-intelligence/diffusion-models.md @@ -0,0 +1,60 @@ +# Diffusion Models + +## Image Generation + +### Diffusion Transformer (DiT) + +* FLUX.1 \[[Code](https://github.com/black-forest-labs/flux)] + * Black Forest Labs + * Text-to-image generation + * Models + * FLUX.1-dev: [https://huggingface.co/black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) + * FLUX.1-schnell: [https://huggingface.co/black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) +* Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (arXiv:2403.03206) \[[arXiv](https://arxiv.org/abs/2403.03206)] \[[Blog](https://stability.ai/news/stable-diffusion-3)] + * Stability AI + * **Stable Diffusion 3 (SD3)** + * Multimodal Diffusion Transformer (MMDiT) + * Models + * Stable Diffusion 3 Medium: [https://huggingface.co/stabilityai/stable-diffusion-3-medium](https://huggingface.co/stabilityai/stable-diffusion-3-medium) +* Scalable Diffusion Models with Transformers (ICCV 2023) \[[arXiv](https://arxiv.org/abs/2212.09748)] \[[Paper](https://openaccess.thecvf.com/content/ICCV2023/html/Peebles\_Scalable\_Diffusion\_Models\_with\_Transformers\_ICCV\_2023\_paper.html)] \[[Code](https://github.com/facebookresearch/DiT)] \[[Homepage](https://www.wpeebles.com/DiT)] + * UC Berkeley & NYU + * **DiT** + +### UNet + +* Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis \[[Technical Report](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors\_paper.pdf)] + * Kuaishou Kolors + * Text-to-image generation + * Model: [https://huggingface.co/Kwai-Kolors/Kolors](https://huggingface.co/Kwai-Kolors/Kolors) +* SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis (arXiv:2307.01952) \[[arXiv](https://arxiv.org/abs/2307.01952)] + * Stability AI + * Models + * [https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) + * [https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0) +* High-Resolution Image Synthesis with Latent Diffusion Models (CVPR 2022) \[[Paper](https://openaccess.thecvf.com/content/CVPR2022/html/Rombach\_High-Resolution\_Image\_Synthesis\_With\_Latent\_Diffusion\_Models\_CVPR\_2022\_paper)] \[[arXiv](https://arxiv.org/abs/2112.10752)] \[[Code](https://github.com/CompVis/stable-diffusion)] + * LMU Munich & Runway ML + * Latent Diffusion Models (LDMs) + * Models + * Stable-Diffusion-v1-5: [https://huggingface.co/runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) + * Initialized with the weights of the **Stable-Diffusion-v1-2** checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512. + +## Video Generation + +* Stable Video 4D (SV4D) + * Stability AI + * Model: [https://huggingface.co/stabilityai/sv4d](https://huggingface.co/stabilityai/sv4d) + * Generate **40** frames (5 video frames x 8 camera views) at 576x576 resolution, given 5 reference frames of the same size. +* Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets (arXiv:2311.15127) \[[arXiv](https://arxiv.org/abs/2311.15127)] \[[Blog](https://stability.ai/news/stable-video-diffusion-open-ai-video-model)] + * Stability AI + * **Stable Video Diffusion** (SVD) + * Text-to-video and image-to-video generation + * Models + * [https://huggingface.co/stabilityai/stable-video-diffusion-img2vid](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid) + * Generate **14** frames at resolution **576x1024** given a context frame of the same size. + * [https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) + * Fine-tuned from the SVD-img2vid. + * Generate **25** frames at resolution **576x1024** given a context frame of the same size. + +## Acronyms + +* LLM: Large Language Model diff --git a/paper-list/artificial-intelligence/dlrm.md b/paper-list/artificial-intelligence/dlrm.md new file mode 100644 index 0000000..86969d2 --- /dev/null +++ b/paper-list/artificial-intelligence/dlrm.md @@ -0,0 +1,10 @@ +# Deep Learning Recommendation Model (DLRM) + +* Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction (DLP-KDD 2022) \[[Paper](https://arxiv.org/abs/2209.12212)] + * Alibaba + * ETA: _Efficient target attention_ network + * Locality-sensitive hashing + * Deployed on Taobao_._ +* Wide & Deep Learning for Recommender Systems (DLRS 2016) \[[Personal Notes](../../reading-notes/miscellaneous/arxiv/2016/wide-and-deep-learning-for-recommender-systems.md)] \[[Paper](https://dl.acm.org/doi/10.1145/2988450.2988454)] + * Google + * WDL: Wide & Deep model diff --git a/paper-list/artificial-intelligence/language-models.md b/paper-list/artificial-intelligence/language-models.md new file mode 100644 index 0000000..ff6d631 --- /dev/null +++ b/paper-list/artificial-intelligence/language-models.md @@ -0,0 +1,37 @@ +# Language Models + +* Grok-2 \[[Blog](https://x.ai/blog/grok-2)] + * xAI + * Grok-2 Beta was released on 2024/08/13. +* Gemma 2: Improving Open Language Models at a Practical Size (arXiv:2408.00118) \[[arXiv](https://arxiv.org/abs/2408.00118)] \[[Code](https://github.com/google-deepmind/gemma)] + * Gemma Team, Google DeepMind + * **Gemma 2** + * Models: [https://www.kaggle.com/models/google/gemma](https://www.kaggle.com/models/google/gemma) +* The Llama 3 Herd of Models (arXiv:2407.21783) \[[arXiv](https://arxiv.org/abs/2407.21783)] \[[Blog](https://ai.meta.com/blog/meta-llama-3/)] \[[Code](https://github.com/meta-llama/llama3)] + * MetaAI + * **Llama 3** + * Models + * Llama 3 8B: [https://huggingface.co/meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) + * Llama 3 70B + * Llama 3 405B +* Mixtral 8x7B (arXiv:2401.04088) \[[arXiv](https://arxiv.org/abs/2401.04088)] \[[Blog](https://mistral.ai/news/mixtral-of-experts/)] \[[Code](https://github.com/mistralai/mistral-inference)] + * Mistral AI + * **Mixtral 8x7B** + * Model: [https://huggingface.co/mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) +* Llama 2: Open Foundation and Fine-Tuned Chat Models (arXiv 2307.09288) \[[Paper](https://arxiv.org/abs/2307.09288)] \[[Homepage](https://ai.meta.com/llama/)] + * Meta AI + * **Llama 2** + * Released with a _permissive_ community license and is available for commercial use. +* LLaMA: Open and Efficient Foundation Language Models (arXiv 2302.13971) \[[Paper](https://arxiv.org/abs/2302.13971)] \[[Code](https://github.com/facebookresearch/llama)] + * Meta AI + * **6.7B, 13B, 32.5B, 65.2B** + * Open-access +* PaLM: Scaling Language Modeling with Pathways (JMLR 2023) \[[Paper](https://www.jmlr.org/papers/v24/22-1144.html)] \[[PaLM API](https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html)] + * **540B**; open access to PaLM APIs in March 2023. +* BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (arXiv 2211.05100) \[[Paper](https://arxiv.org/abs/2211.05100)] \[[Model](https://huggingface.co/bigscience/bloom)] \[[Blog](https://bigscience.huggingface.co/blog/bloom)] + * **176B** + * open-access +* OPT: Open Pre-trained Transformer Language Models (arXiv: 2205.01068) \[[Paper](https://arxiv.org/abs/2205.01068)] \[[Code](https://github.com/facebookresearch/metaseq/tree/main/projects/OPT)] + * Meta AI + * Range from 125M to 175B parameters. + * Open-access diff --git a/paper-list/systems-for-ml/dlrm.md b/paper-list/systems-for-ml/dlrm.md index f015b19..4b7e6c9 100644 --- a/paper-list/systems-for-ml/dlrm.md +++ b/paper-list/systems-for-ml/dlrm.md @@ -41,17 +41,6 @@ * Tencent & Edinburgh * P2P model update dissemination. -## DLRM - -* Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction (DLP-KDD 2022) \[[Paper](https://arxiv.org/abs/2209.12212)] - * Alibaba - * ETA: _Efficient target attention_ network - * Locality-sensitive hashing - * Deployed on Taobao_._ -* Wide & Deep Learning for Recommender Systems (DLRS 2016) \[[Personal Notes](../../reading-notes/miscellaneous/arxiv/2016/wide-and-deep-learning-for-recommender-systems.md)] \[[Paper](https://dl.acm.org/doi/10.1145/2988450.2988454)] - * Google - * WDL: Wide & Deep model - ## Acronyms * DLRM: Deep Learning Recommendation Model diff --git a/paper-list/systems-for-ml/llm.md b/paper-list/systems-for-ml/llm.md index 98017b2..5eb9d63 100644 --- a/paper-list/systems-for-ml/llm.md +++ b/paper-list/systems-for-ml/llm.md @@ -145,24 +145,6 @@ I am actively maintaining this list. * PUZZLE: Efficiently Aligning Large Language Models through Light-Weight Context Switch ([ATC 2024](../../reading-notes/conference/atc-2024.md)) \[[Paper](https://www.usenix.org/conference/atc24/presentation/lei)] * THU -## LLMs - -* Llama 2: Open Foundation and Fine-Tuned Chat Models (arXiv 2307.09288) \[[Paper](https://arxiv.org/abs/2307.09288)] \[[Homepage](https://ai.meta.com/llama/)] - * Released with a _permissive_ community license and is available for commercial use. -* LLaMA: Open and Efficient Foundation Language Models (arXiv 2302.13971) \[[Paper](https://arxiv.org/abs/2302.13971)] \[[Code](https://github.com/facebookresearch/llama)] - * Meta AI - * **6.7B, 13B, 32.5B, 65.2B** - * Open-access -* PaLM: Scaling Language Modeling with Pathways (JMLR 2023) \[[Paper](https://www.jmlr.org/papers/v24/22-1144.html)] \[[PaLM API](https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html)] - * **540B**; open access to PaLM APIs in March 2023. -* BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (arXiv 2211.05100) \[[Paper](https://arxiv.org/abs/2211.05100)] \[[Model](https://huggingface.co/bigscience/bloom)] \[[Blog](https://bigscience.huggingface.co/blog/bloom)] - * **176B** - * open-access -* OPT: Open Pre-trained Transformer Language Models (arXiv: 2205.01068) \[[Paper](https://arxiv.org/abs/2205.01068)] \[[Code](https://github.com/facebookresearch/metaseq/tree/main/projects/OPT)] - * Meta AI - * Range from 125M to 175B parameters. - * Open-access - ## Acronyms * LLM: Large Language Model