Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Extensive / Improved guides about "Text2Image", "Image2Image", "Inpainting" #4758

Closed
patrickvonplaten opened this issue Aug 24, 2023 · 12 comments
Labels
stale Issues that haven't received updates

Comments

@patrickvonplaten
Copy link
Contributor

We very often run into issues like the following #4392 where people don't know how to correctly use diffusers or are unaware of all the existing features.

I propose to write three very extensive guides or possible even write whole subsections about the main important tasks:

  • Text2Image
  • Image2Image
  • Inpainting

and how to chain those together.

That will replace the following guides:

Each guide should be introduced with an easy example (how to use it) and then go deeper into more advanced use cases. This means:

  1. Text-to-image.

    • We explain how a very simple example works and show how different models generate different result. We could showcase the following models here:
      • SD 1.5
      • SDXL
      • Kandinsky 2.2
      • ControlNet

    Then we go a bit deeper into "height" & "width" to show the user how the output sizes can be changed
    Talk about guidance_scale, generator.
    Talk about how extra conditionings can be added via controlnet

    Also link to the prompt weighting and optimization docs
    We then make a transition to "Modifying existing images" inking to the next "Img2Img" and "Inpaint" sections

    => All examples here can use the AutoPipelineForText2Image.

  2. Image-to-Image

    • We show a simple example on how it works, showcasing:

      • SD 1.5
      • SDXL
      • Kandinsky 2.2

      We explain how the input image can/should look like.
      We then go a bit deeper into the "strength" parameter (super important parameter!!!). We explain how the width & height is determined by the image itself.
      We then explain how img2img can be chained right after text-to-image - keeping everything in latent space.
      We explain how img2img can be used to make upscaled images sharper.
      We then explain how mulitple img2img models can be chained together for just a few steps (e.g. it's totally reasonable to use multiple differently fine-tuned SD checkpoints for image translation.
      We show how Kandinsky & Stable Diffusion can be mixed.
      We explain how to use controlnet for img2img.

    => All examples here can use the AutoPipelineForImage2Image.

  3. Inpainting

    • We show a simple example on how it works, showcasing:
      • SD 1.5 inpainting
      • Kandinsky 2.2 inpainting

    We explain how the input & mask image can/should look like.
    We then go a bit deeper into the "strength" parameter again. We explain how the width & height is determined by the image itself.
    We then explain how inpainting can be chained right after text-to-image or image-to-image - keeping everything in latent space and without reloading the whole model.
    We explain how img2img and inpainting can be super similar (cc @yiyixuxu - we chatted about this yesterday)
    We then explain how mulitple inpainting models can be chained together for just a few steps (e.g. it's totally reasonable to use multiple differently fine-tuned SD checkpoints for image translation.
    We show how Kandinsky & Stable Diffusion can be mixed.
    We explain how to use controlnet for inpainting.

    => All examples here can use the AutoPipelineForInpainting.

I think it's worth to make this a really in-detail / easy-to-understand guide, make sure it works in colab and also think about creating some video content about it.

Thoughts? @pcuenca @williamberman @sayakpaul @yiyixuxu @DN6 @stevhliu @patil-suraj

If you like the idea, maybe @stevhliu and I could look more into this

@patrickvonplaten patrickvonplaten changed the title Extensive / Improved guides about "Text2Image", "Image2Image", "Inpainting" [Docs] Extensive / Improved guides about "Text2Image", "Image2Image", "Inpainting" Aug 24, 2023
@patrickvonplaten
Copy link
Contributor Author

Main motivation are constant issues such as: #4392 (comment)

@patrickvonplaten
Copy link
Contributor Author

@williamberman
Copy link
Contributor

Yes, this is a very good idea. I do feel like we suffer from a bit of document sprawl where there are lots of good docs but not an obvious order to read them through.

In the limit, a chapter dependency dag is always nice 😁

image

@yiyixuxu
Copy link
Collaborator

love the idea!

@JohanHuynh0130
Copy link

Love the Idea as well! I have tons of difficulty trying to understand the documentation of Diffusers

@sayakpaul
Copy link
Member

Love the Idea as well! I have tons of difficulty trying to understand the documentation of Diffusers

Could you pinpoint some of it in bulleted points?

@DN6
Copy link
Collaborator

DN6 commented Aug 25, 2023

Better guides are always a good idea 🙂

@stevhliu
Copy link
Member

Awesome idea! 💯

@VigneshHexo
Copy link

VigneshHexo commented Aug 26, 2023

Following!
Would be happy to write one! Please let me know @patrickvonplaten

@vionwinnie
Copy link

Happy to help!

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Oct 18, 2023
@stevhliu
Copy link
Member

Closing since we've updated the guides already :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues that haven't received updates
Projects
None yet
Development

No branches or pull requests

9 participants