Skip to content

High-Resolution Image Synthesis with Latent Diffusion Models

License

Notifications You must be signed in to change notification settings

pashu123/stablediffusion

 
 

Repository files navigation

Stable Diffusion 2.0

t2i t2i t2i

This repository contains Stable Diffusion models trained from scratch and will be continuously updated with new checkpoints. The following list provides an overview of all currently available models. More coming soon.

News

November 2022

  • New stable diffusion model (Stable Diffusion 2.0-v) at 768x768 resolution. Same number of parameters in the U-Net as 1.5, but uses OpenCLIP-ViT/H as the text encoder and is trained from scratch. SD 2.0-v is a so-called v-prediction model.

  • The above model is finetuned from SD 2.0-base, which was trained as a standard noise-prediction model on 512x512 images and is also made available.

  • Added a x4 upscaling latent text-guided diffusion model.

  • New depth-guided stable diffusion model, finetuned from SD 2.0-base. The model is conditioned on monocular depth estimates inferred via MiDaS and can be used for structure-preserving img2img and shape-conditional synthesis.

    d2i

  • A text-guided inpainting model, finetuned from SD 2.0-base.

We follow the original repository and provide basic inference scripts to sample from the models.


The original Stable Diffusion model was created in a collaboration with CompVis and RunwayML and builds upon the work:

High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach*, Andreas Blattmann*, Dominik Lorenz, Patrick Esser, Björn Ommer
CVPR '22 Oral | GitHub | arXiv | Project page

and many others.

Stable Diffusion is a latent text-to-image diffusion model.


Requirements

You can update an existing latent diffusion environment by running

conda install pytorch==1.12.1 torchvision==0.13.1 -c pytorch
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .

xformers efficient attention

For more efficiency and speed on GPUs, we highly recommended installing the xformers library.

Tested on A100 with CUDA 11.4. Installation needs a somewhat recent version of nvcc and gcc/g++, obtain those, e.g., via