Taming Transformers for High-Resolution Image Synthesis

Patrick Esser and Robin Rombach et al., Heidelberg University) - CVPR 2021 (oral)

Summarized by Woohyeon Sim

Task: retaining the expressivity of transformer for high-resolution synthesis in VQVAE.

Main novelty: integrating transformer to learn long-range interactions within global compositions (image synthesis) / a wide range of applications of image synthesis

Approach: building backbone with AE / embedding discretization / synthesis by compositing embeddings in auto-regressive way / conditioning in similar fashion to unconditional setup.

Learn auto-encoder with perceptual loss and patch-based discriminator to keep good synthesis quality.
Discretize embeddings into codebook by element-wise quantization to reduce the search space of compositions (the length of codebook is set to 16x16).
Generate images by choosing ordering of embeddings as autoregressive next-index prediction: Given indices $s_{<i}$, the transformer learns to predict the distribution of possible next indices, i.e. $p(s_i|s_{<i})$to compute the likelihood of the full representation as $p(s)=\prod_{i}p(s_{i}|s_{<i})$. The training proceeds by maximizing the log-likelihood of $p(s)$.
For conditioning, obtain again a codebook for conditional information and simply prepend its corresponding embeddings to $s$. This matches with the negative log-likelihood of entries $p(s_i|s_{<i}, c)$ and allows to compute the likelihood of the sequence as $p(s)=\prod_{i}p(s_{i}|s_{<i},c)$.

Applications: synthesis across different conditioning inputs (depth-to-image, symantic layout-to-image, image completion, pose-guided person generation, class conditional, semantic segmentation mask)

총평: 단순한 아이디어고 이전 논문에 얹혀가는 느낌이 없잖아 있지만 응용할 수 있는 부분이 많고 실험 결과가 크게 좋아져 임팩트가 큰 논문.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

45.md

45.md

Taming Transformers for High-Resolution Image Synthesis

Patrick Esser and Robin Rombach et al., Heidelberg University) - CVPR 2021 (oral)

Summarized by Woohyeon Sim

Files

45.md

Latest commit

History

45.md

File metadata and controls

Taming Transformers for High-Resolution Image Synthesis

Patrick Esser and Robin Rombach et al., Heidelberg University) - CVPR 2021 (oral)

Summarized by Woohyeon Sim