-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
GITBOOK-190: Create new paper lists to organize AI papers (diffusion …
…models, language models, DLRMs)
- Loading branch information
1 parent
bc236ef
commit 774a0bd
Showing
8 changed files
with
117 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Artificial Intelligence (AI) | ||
|
||
* [Diffusion Models](diffusion-models.md) | ||
* [Language Models](language-models.md) | ||
* [Deep Learning Recommendation Models](dlrm.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# Diffusion Models | ||
|
||
## Image Generation | ||
|
||
### Diffusion Transformer (DiT) | ||
|
||
* FLUX.1 \[[Code](https://github.com/black-forest-labs/flux)] | ||
* Black Forest Labs | ||
* Text-to-image generation | ||
* Models | ||
* FLUX.1-dev: [https://huggingface.co/black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) | ||
* FLUX.1-schnell: [https://huggingface.co/black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) | ||
* Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (arXiv:2403.03206) \[[arXiv](https://arxiv.org/abs/2403.03206)] \[[Blog](https://stability.ai/news/stable-diffusion-3)] | ||
* Stability AI | ||
* **Stable Diffusion 3 (SD3)** | ||
* Multimodal Diffusion Transformer (MMDiT) | ||
* Models | ||
* Stable Diffusion 3 Medium: [https://huggingface.co/stabilityai/stable-diffusion-3-medium](https://huggingface.co/stabilityai/stable-diffusion-3-medium) | ||
* Scalable Diffusion Models with Transformers (ICCV 2023) \[[arXiv](https://arxiv.org/abs/2212.09748)] \[[Paper](https://openaccess.thecvf.com/content/ICCV2023/html/Peebles\_Scalable\_Diffusion\_Models\_with\_Transformers\_ICCV\_2023\_paper.html)] \[[Code](https://github.com/facebookresearch/DiT)] \[[Homepage](https://www.wpeebles.com/DiT)] | ||
* UC Berkeley & NYU | ||
* **DiT** | ||
|
||
### UNet | ||
|
||
* Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis \[[Technical Report](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors\_paper.pdf)] | ||
* Kuaishou Kolors | ||
* Text-to-image generation | ||
* Model: [https://huggingface.co/Kwai-Kolors/Kolors](https://huggingface.co/Kwai-Kolors/Kolors) | ||
* SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis (arXiv:2307.01952) \[[arXiv](https://arxiv.org/abs/2307.01952)] | ||
* Stability AI | ||
* Models | ||
* [https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) | ||
* [https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0) | ||
* High-Resolution Image Synthesis with Latent Diffusion Models (CVPR 2022) \[[Paper](https://openaccess.thecvf.com/content/CVPR2022/html/Rombach\_High-Resolution\_Image\_Synthesis\_With\_Latent\_Diffusion\_Models\_CVPR\_2022\_paper)] \[[arXiv](https://arxiv.org/abs/2112.10752)] \[[Code](https://github.com/CompVis/stable-diffusion)] | ||
* LMU Munich & Runway ML | ||
* Latent Diffusion Models (LDMs) | ||
* Models | ||
* Stable-Diffusion-v1-5: [https://huggingface.co/runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) | ||
* Initialized with the weights of the **Stable-Diffusion-v1-2** checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512. | ||
|
||
## Video Generation | ||
|
||
* Stable Video 4D (SV4D) | ||
* Stability AI | ||
* Model: [https://huggingface.co/stabilityai/sv4d](https://huggingface.co/stabilityai/sv4d) | ||
* Generate **40** frames (5 video frames x 8 camera views) at 576x576 resolution, given 5 reference frames of the same size. | ||
* Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets (arXiv:2311.15127) \[[arXiv](https://arxiv.org/abs/2311.15127)] \[[Blog](https://stability.ai/news/stable-video-diffusion-open-ai-video-model)] | ||
* Stability AI | ||
* **Stable Video Diffusion** (SVD) | ||
* Text-to-video and image-to-video generation | ||
* Models | ||
* [https://huggingface.co/stabilityai/stable-video-diffusion-img2vid](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid) | ||
* Generate **14** frames at resolution **576x1024** given a context frame of the same size. | ||
* [https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) | ||
* Fine-tuned from the SVD-img2vid. | ||
* Generate **25** frames at resolution **576x1024** given a context frame of the same size. | ||
|
||
## Acronyms | ||
|
||
* LLM: Large Language Model |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Deep Learning Recommendation Model (DLRM) | ||
|
||
* Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction (DLP-KDD 2022) \[[Paper](https://arxiv.org/abs/2209.12212)] | ||
* Alibaba | ||
* ETA: _Efficient target attention_ network | ||
* Locality-sensitive hashing | ||
* Deployed on Taobao_._ | ||
* Wide & Deep Learning for Recommender Systems (DLRS 2016) \[[Personal Notes](../../reading-notes/miscellaneous/arxiv/2016/wide-and-deep-learning-for-recommender-systems.md)] \[[Paper](https://dl.acm.org/doi/10.1145/2988450.2988454)] | ||
* WDL: Wide & Deep model |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Language Models | ||
|
||
* Grok-2 \[[Blog](https://x.ai/blog/grok-2)] | ||
* xAI | ||
* Grok-2 Beta was released on 2024/08/13. | ||
* Gemma 2: Improving Open Language Models at a Practical Size (arXiv:2408.00118) \[[arXiv](https://arxiv.org/abs/2408.00118)] \[[Code](https://github.com/google-deepmind/gemma)] | ||
* Gemma Team, Google DeepMind | ||
* **Gemma 2** | ||
* Models: [https://www.kaggle.com/models/google/gemma](https://www.kaggle.com/models/google/gemma) | ||
* The Llama 3 Herd of Models (arXiv:2407.21783) \[[arXiv](https://arxiv.org/abs/2407.21783)] \[[Blog](https://ai.meta.com/blog/meta-llama-3/)] \[[Code](https://github.com/meta-llama/llama3)] | ||
*  MetaAI | ||
* **Llama 3** | ||
* Models | ||
* Llama 3 8B: [https://huggingface.co/meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | ||
* Llama 3 70B | ||
* Llama 3 405B | ||
* Mixtral 8x7B (arXiv:2401.04088) \[[arXiv](https://arxiv.org/abs/2401.04088)] \[[Blog](https://mistral.ai/news/mixtral-of-experts/)] \[[Code](https://github.com/mistralai/mistral-inference)] | ||
* Mistral AI | ||
* **Mixtral 8x7B** | ||
* Model: [https://huggingface.co/mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | ||
* Llama 2: Open Foundation and Fine-Tuned Chat Models (arXiv 2307.09288) \[[Paper](https://arxiv.org/abs/2307.09288)] \[[Homepage](https://ai.meta.com/llama/)] | ||
* Meta AI | ||
* **Llama 2** | ||
* Released with a _permissive_ community license and is available for commercial use. | ||
* LLaMA: Open and Efficient Foundation Language Models (arXiv 2302.13971) \[[Paper](https://arxiv.org/abs/2302.13971)] \[[Code](https://github.com/facebookresearch/llama)] | ||
* Meta AI | ||
* **6.7B, 13B, 32.5B, 65.2B** | ||
* Open-access | ||
* PaLM: Scaling Language Modeling with Pathways (JMLR 2023) \[[Paper](https://www.jmlr.org/papers/v24/22-1144.html)] \[[PaLM API](https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html)] | ||
* **540B**; open access to PaLM APIs in March 2023. | ||
* BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (arXiv 2211.05100) \[[Paper](https://arxiv.org/abs/2211.05100)] \[[Model](https://huggingface.co/bigscience/bloom)] \[[Blog](https://bigscience.huggingface.co/blog/bloom)] | ||
* **176B** | ||
* open-access | ||
* OPT: Open Pre-trained Transformer Language Models (arXiv: 2205.01068) \[[Paper](https://arxiv.org/abs/2205.01068)] \[[Code](https://github.com/facebookresearch/metaseq/tree/main/projects/OPT)] | ||
* Meta AI | ||
* Range from 125M to 175B parameters. | ||
* Open-access |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters