diff --git a/docs/source/en/conceptual/contribution.md b/docs/source/en/conceptual/contribution.md index 74393ebf3eb3..dc942a24c42e 100644 --- a/docs/source/en/conceptual/contribution.md +++ b/docs/source/en/conceptual/contribution.md @@ -28,11 +28,11 @@ the core library. In the following, we give an overview of different ways to contribute, ranked by difficulty in ascending order. All of them are valuable to the community. * 1. Asking and answering questions on [the Diffusers discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers) or on [Discord](https://discord.gg/G7tWnz98XR). -* 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose) -* 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues) +* 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose). +* 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues). * 4. Fix a simple issue, marked by the "Good first issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22). * 5. Contribute to the [documentation](https://github.com/huggingface/diffusers/tree/main/docs/source). -* 6. Contribute a [Community Pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3Acommunity-examples) +* 6. Contribute a [Community Pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3Acommunity-examples). * 7. Contribute to the [examples](https://github.com/huggingface/diffusers/tree/main/examples). * 8. Fix a more difficult issue, marked by the "Good second issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22). * 9. Add a new pipeline, model, or scheduler, see ["New Pipeline/Model"](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) and ["New scheduler"](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22) issues. For this contribution, please have a look at [Design Philosophy](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md). @@ -40,7 +40,7 @@ In the following, we give an overview of different ways to contribute, ranked by As said before, **all contributions are valuable to the community**. In the following, we will explain each contribution a bit more in detail. -For all contributions 4.-9. you will need to open a PR. It is explained in detail how to do so in [Opening a pull request](#how-to-open-a-pr) +For all contributions 4 - 9, you will need to open a PR. It is explained in detail how to do so in [Opening a pull request](#how-to-open-a-pr). ### 1. Asking and answering questions on the Diffusers discussion forum or on the Diffusers Discord @@ -57,7 +57,7 @@ Any question or comment related to the Diffusers library can be asked on the [di - ... Every question that is asked on the forum or on Discord actively encourages the community to publicly -share knowledge and might very well help a beginner in the future that has the same question you're +share knowledge and might very well help a beginner in the future who has the same question you're having. Please do pose any questions you might have. In the same spirit, you are of immense help to the community by answering such questions because this way you are publicly documenting knowledge for everybody to learn from. @@ -91,12 +91,12 @@ open a new issue nevertheless and link to the related issue. New issues usually include the following. -#### 2.1. Reproducible, minimal bug reports. +#### 2.1. Reproducible, minimal bug reports A bug report should always have a reproducible code snippet and be as minimal and concise as possible. This means in more detail: -- Narrow the bug down as much as you can, **do not just dump your whole code file** -- Format your code +- Narrow the bug down as much as you can, **do not just dump your whole code file**. +- Format your code. - Do not include any external libraries except for Diffusers depending on them. - **Always** provide all necessary information about your environment; for this, you can run: `diffusers-cli env` in your shell and copy-paste the displayed information to the issue. - Explain the issue. If the reader doesn't know what the issue is and why it is an issue, she cannot solve it. @@ -105,9 +105,9 @@ This means in more detail: For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section. -You can open a bug report [here](https://github.com/huggingface/diffusers/issues/new/choose). +You can open a bug report [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=bug&projects=&template=bug-report.yml). -#### 2.2. Feature requests. +#### 2.2. Feature requests A world-class feature request addresses the following points: @@ -125,26 +125,26 @@ Awesome! Tell us what problem it solved for you. You can open a feature request [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=). -#### 2.3 Feedback. +#### 2.3 Feedback Feedback about the library design and why it is good or not good helps the core maintainers immensely to build a user-friendly library. To understand the philosophy behind the current design philosophy, please have a look [here](https://huggingface.co/docs/diffusers/conceptual/philosophy). If you feel like a certain design choice does not fit with the current design philosophy, please explain why and how it should be changed. If a certain design choice follows the design philosophy too much, hence restricting use cases, explain why and how it should be changed. If a certain design choice is very useful for you, please also leave a note as this is great feedback for future design decisions. You can open an issue about feedback [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=). -#### 2.4 Technical questions. +#### 2.4 Technical questions -Technical questions are mainly about why certain code of the library was written in a certain way, or what a certain part of the code does. Please make sure to link to the code in question and please provide detail on +Technical questions are mainly about why certain code of the library was written in a certain way, or what a certain part of the code does. Please make sure to link to the code in question and please provide details on why this part of the code is difficult to understand. You can open an issue about a technical question [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=bug&template=bug-report.yml). -#### 2.5 Proposal to add a new model, scheduler, or pipeline. +#### 2.5 Proposal to add a new model, scheduler, or pipeline If the diffusion model community released a new model, pipeline, or scheduler that you would like to see in the Diffusers library, please provide the following information: * Short description of the diffusion pipeline, model, or scheduler and link to the paper or public release. -* Link to any of its open-source implementation. +* Link to any of its open-source implementation(s). * Link to the model weights if they are available. If you are willing to contribute to the model yourself, let us know so we can best guide you. Also, don't forget @@ -156,21 +156,21 @@ You can open a request for a model/pipeline/scheduler [here](https://github.com/ Answering issues on GitHub might require some technical knowledge of Diffusers, but we encourage everybody to give it a try even if you are not 100% certain that your answer is correct. Some tips to give a high-quality answer to an issue: -- Be as concise and minimal as possible +- Be as concise and minimal as possible. - Stay on topic. An answer to the issue should concern the issue and only the issue. - Provide links to code, papers, or other sources that prove or encourage your point. - Answer in code. If a simple code snippet is the answer to the issue or shows how the issue can be solved, please provide a fully reproducible code snippet. Also, many issues tend to be simply off-topic, duplicates of other issues, or irrelevant. It is of great help to the maintainers if you can answer such issues, encouraging the author of the issue to be -more precise, provide the link to a duplicated issue or redirect them to [the forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or [Discord](https://discord.gg/G7tWnz98XR) +more precise, provide the link to a duplicated issue or redirect them to [the forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or [Discord](https://discord.gg/G7tWnz98XR). If you have verified that the issued bug report is correct and requires a correction in the source code, please have a look at the next sections. For all of the following contributions, you will need to open a PR. It is explained in detail how to do so in the [Opening a pull request](#how-to-open-a-pr) section. -### 4. Fixing a `Good first issue` +### 4. Fixing a "Good first issue" *Good first issues* are marked by the [Good first issue](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) label. Usually, the issue already explains how a potential solution should look so that it is easier to fix. @@ -188,7 +188,7 @@ valuable contribution**. Contributing to the library can have many forms: - Correcting spelling or grammatical errors. -- Correct incorrect formatting of the docstring. If you see that the official documentation is weirdly displayed or a link is broken, we are very happy if you take some time to correct it. +- Correct incorrect formatting of the docstring. If you see that the official documentation is weirdly displayed or a link is broken, we would be very happy if you take some time to correct it. - Correct the shape or dimensions of a docstring input or output tensor. - Clarify documentation that is hard to understand or incorrect. - Update outdated code examples. @@ -202,7 +202,7 @@ Please have a look at [this page](https://github.com/huggingface/diffusers/tree/ ### 6. Contribute a community pipeline [Pipelines](https://huggingface.co/docs/diffusers/api/pipelines/overview) are usually the first point of contact between the Diffusers library and the user. -Pipelines are examples of how to use Diffusers [models](https://huggingface.co/docs/diffusers/api/models) and [schedulers](https://huggingface.co/docs/diffusers/api/schedulers/overview). +Pipelines are examples of how to use Diffusers [models](https://huggingface.co/docs/diffusers/api/models/overview) and [schedulers](https://huggingface.co/docs/diffusers/api/schedulers/overview). We support two types of pipelines: - Official Pipelines @@ -242,46 +242,46 @@ We support two types of training examples: Research training examples are located in [examples/research_projects](https://github.com/huggingface/diffusers/tree/main/examples/research_projects) whereas official training examples include all folders under [examples](https://github.com/huggingface/diffusers/tree/main/examples) except the `research_projects` and `community` folders. The official training examples are maintained by the Diffusers' core maintainers whereas the research training examples are maintained by the community. -This is because of the same reasons put forward in [6. Contribute a community pipeline](#contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models. +This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models. If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author. Both official training and research examples consist of a directory that contains one or more training scripts, a requirements.txt file, and a README.md file. In order for the user to make use of the training examples, it is required to clone the repository: -``` +```bash git clone https://github.com/huggingface/diffusers ``` as well as to install all additional dependencies required for training: -``` +```bash pip install -r /examples//requirements.txt ``` Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt). Training examples of the Diffusers library should adhere to the following philosophy: -- All the code necessary to run the examples should be found in a single Python file -- One should be able to run the example from the command line with `python .py --args` +- All the code necessary to run the examples should be found in a single Python file. +- One should be able to run the example from the command line with `python .py --args`. - Examples should be kept simple and serve as **an example** on how to use Diffusers for training. The purpose of example scripts is **not** to create state-of-the-art diffusion models, but rather to reproduce known training schemes without adding too much custom logic. As a byproduct of this point, our examples also strive to serve as good educational materials. To contribute an example, it is highly recommended to look at already existing examples such as [dreambooth](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py) to get an idea of how they should look like. We strongly advise contributors to make use of the [Accelerate library](https://github.com/huggingface/accelerate) as it's tightly integrated with Diffusers. Once an example script works, please make sure to add a comprehensive `README.md` that states how to use the example exactly. This README should include: -- An example command on how to run the example script as shown [here e.g.](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#running-locally-with-pytorch). -- A link to some training results (logs, models, ...) that show what the user can expect as shown [here e.g.](https://api.wandb.ai/report/patrickvonplaten/xm6cd5q5). +- An example command on how to run the example script as shown [here](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#running-locally-with-pytorch). +- A link to some training results (logs, models, etc.) that show what the user can expect as shown [here](https://api.wandb.ai/report/patrickvonplaten/xm6cd5q5). - If you are adding a non-official/research training example, **please don't forget** to add a sentence that you are maintaining this training example which includes your git handle as shown [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/intel_opts#diffusers-examples-with-intel-optimizations). If you are contributing to the official training examples, please also make sure to add a test to [examples/test_examples.py](https://github.com/huggingface/diffusers/blob/main/examples/test_examples.py). This is not necessary for non-official training examples. -### 8. Fixing a `Good second issue` +### 8. Fixing a "Good second issue" *Good second issues* are marked by the [Good second issue](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22) label. Good second issues are usually more complicated to solve than [Good first issues](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22). The issue description usually gives less guidance on how to fix the issue and requires a decent understanding of the library by the interested contributor. -If you are interested in tackling a second good issue, feel free to open a PR to fix it and link the PR to the issue. If you see that a PR has already been opened for this issue but did not get merged, have a look to understand why it wasn't merged and try to open an improved PR. +If you are interested in tackling a good second issue, feel free to open a PR to fix it and link the PR to the issue. If you see that a PR has already been opened for this issue but did not get merged, have a look to understand why it wasn't merged and try to open an improved PR. Good second issues are usually more difficult to get merged compared to good first issues, so don't hesitate to ask for help from the core maintainers. If your PR is almost finished the core maintainers can also jump into your PR and commit to it in order to get it merged. ### 9. Adding pipelines, models, schedulers @@ -297,7 +297,7 @@ if you don't know yet what specific component you would like to add: - [Model or pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) - [Scheduler](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22) -Before adding any of the three components, it is strongly recommended that you give the [Philosophy guide](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22) a read to better understand the design of any of the three components. Please be aware that +Before adding any of the three components, it is strongly recommended that you give the [Philosophy guide](philosophy) a read to better understand the design of any of the three components. Please be aware that we cannot merge model, scheduler, or pipeline additions that strongly diverge from our design philosophy as it will lead to API inconsistencies. If you fundamentally disagree with a design choice, please open a [Feedback issue](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=) instead so that it can be discussed whether a certain design @@ -337,8 +337,8 @@ to be merged; 9. Add high-coverage tests. No quality testing = no merge. - If you are adding new `@slow` tests, make sure they pass using `RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`. -CircleCI does not run the slow tests, but GitHub actions does every night! -10. All public methods must have informative docstrings that work nicely with markdown. See `[pipeline_latent_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py)` for an example. +CircleCI does not run the slow tests, but GitHub Actions does every night! +10. All public methods must have informative docstrings that work nicely with markdown. See [`pipeline_latent_diffusion.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py) for an example. 11. Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos, and other non-text files. We prefer to leverage a hf.co hosted `dataset` like [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) or [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images) to place these files. If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images @@ -364,7 +364,7 @@ under your GitHub user account. 2. Clone your fork to your local disk, and add the base repository as a remote: ```bash - $ git clone git@github.com:/diffusers.git + $ git clone git@github.com:/diffusers.git $ cd diffusers $ git remote add upstream https://github.com/huggingface/diffusers.git ``` @@ -395,7 +395,14 @@ passes. You should run the tests impacted by your changes like this: $ pytest tests/.py ``` -You can also run the full suite with the following command, but it takes +Before you run the tests, please make sure you install the dependencies required for testing. You can do so +with this command: + + ```bash + $ pip install -e ".[test]" + ``` + +You can also run the full test suite with the following command, but it takes a beefy machine to produce a result in a decent amount of time now that Diffusers has grown a lot. Here is the command for it: @@ -423,7 +430,7 @@ make a commit with `git commit` to record your changes locally: ```bash $ git add modified_file.py - $ git commit + $ git commit -m "A descriptive message about your changes." ``` It is a good idea to sync your copy of the code with the original @@ -443,7 +450,7 @@ Push the changes to your account using: webpage of your fork on GitHub. Click on 'Pull request' to send your changes to the project maintainers for review. -7. It's ok if maintainers ask you for changes. It happens to core contributors +7. It's OK if maintainers ask you for changes. It happens to core contributors too! So everyone can see the changes in the Pull request, work in your local branch and push the changes to your fork. They will automatically appear in the pull request. @@ -486,7 +493,7 @@ To avoid pinging the upstream repository which adds reference notes to each upst when syncing the main branch of a forked repository, please, follow these steps: 1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead, merge directly into the forked main. 2. If a PR is absolutely necessary, use the following steps after checking out your branch: -``` +```bash $ git checkout -b your-branch-for-syncing $ git pull --squash --no-commit upstream main $ git commit -m '' @@ -495,4 +502,4 @@ $ git push --set-upstream origin your-branch-for-syncing ### Style guide -For documentation strings, 🧨 Diffusers follows the [google style](https://google.github.io/styleguide/pyguide.html). +For documentation strings, 🧨 Diffusers follows the [Google style](https://google.github.io/styleguide/pyguide.html). diff --git a/docs/source/en/conceptual/ethical_guidelines.md b/docs/source/en/conceptual/ethical_guidelines.md index 100a92152f00..fe1d849f44ff 100644 --- a/docs/source/en/conceptual/ethical_guidelines.md +++ b/docs/source/en/conceptual/ethical_guidelines.md @@ -1,3 +1,15 @@ + + # 🧨 Diffusers’ Ethical Guidelines ## Preamble @@ -42,7 +54,7 @@ The team works daily to make the technical and non-technical tools available to - **Encouraging safety in deployment** - - [**Safe Stable Diffusion**](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion_safe): It mitigates the well-known issue that models, like Stable Diffusion, that are trained on unfiltered, web-crawled datasets tend to suffer from inappropriate degeneration. Related paper: [Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models](https://arxiv.org/abs/2211.05105). + - [**Safe Stable Diffusion**](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_safe): It mitigates the well-known issue that models, like Stable Diffusion, that are trained on unfiltered, web-crawled datasets tend to suffer from inappropriate degeneration. Related paper: [Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models](https://arxiv.org/abs/2211.05105). - [**Safety Checker**](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py): It checks and compares the class probability of a set of hard-coded harmful concepts in the embedding space against an image after it has been generated. The harmful concepts are intentionally hidden to prevent reverse engineering of the checker. diff --git a/docs/source/en/conceptual/evaluation.md b/docs/source/en/conceptual/evaluation.md index d714c5d975ce..997c5f4016dc 100644 --- a/docs/source/en/conceptual/evaluation.md +++ b/docs/source/en/conceptual/evaluation.md @@ -32,7 +32,7 @@ The methods shown in this document can also be used to evaluate different [noise We cover Diffusion models with the following pipelines: - Text-guided image generation (such as the [`StableDiffusionPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/text2img)). -- Text-guided image generation, additionally conditioned on an input image (such as the [`StableDiffusionImg2ImgPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/img2img), and [`StableDiffusionInstructPix2PixPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/pix2pix)). +- Text-guided image generation, additionally conditioned on an input image (such as the [`StableDiffusionImg2ImgPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/img2img) and [`StableDiffusionInstructPix2PixPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pix2pix)). - Class-conditioned image generation models (such as the [`DiTPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/dit)). ## Qualitative Evaluation @@ -87,7 +87,7 @@ import torch seed = 0 generator = torch.manual_seed(seed) -images = sd_pipeline(sample_prompts, num_images_per_prompt=1, generator=generator, output_type="numpy").images +images = sd_pipeline(sample_prompts, num_images_per_prompt=1, generator=generator).images ``` ![parti-prompts-14](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/parti-prompts-14.png) @@ -141,7 +141,7 @@ prompts = [ "A small cabin on top of a snowy mountain in the style of Disney, artstation", ] -images = sd_pipeline(prompts, num_images_per_prompt=1, output_type="numpy").images +images = sd_pipeline(prompts, num_images_per_prompt=1, output_type="np").images print(images.shape) # (6, 512, 512, 3) @@ -155,13 +155,11 @@ from functools import partial clip_score_fn = partial(clip_score, model_name_or_path="openai/clip-vit-base-patch16") - def calculate_clip_score(images, prompts): images_int = (images * 255).astype("uint8") clip_score = clip_score_fn(torch.from_numpy(images_int).permute(0, 3, 1, 2), prompts).detach() return round(float(clip_score), 4) - sd_clip_score = calculate_clip_score(images, prompts) print(f"CLIP score: {sd_clip_score}") # CLIP score: 35.7038 @@ -176,7 +174,7 @@ fixed seed with the [v1-4 Stable Diffusion checkpoint](https://huggingface.co/Co seed = 0 generator = torch.manual_seed(seed) -images = sd_pipeline(prompts, num_images_per_prompt=1, generator=generator, output_type="numpy").images +images = sd_pipeline(prompts, num_images_per_prompt=1, generator=generator, output_type="np").images ``` Then we load the [v1-5 checkpoint](https://huggingface.co/runwayml/stable-diffusion-v1-5) to generate images: @@ -185,7 +183,7 @@ Then we load the [v1-5 checkpoint](https://huggingface.co/runwayml/stable-diffus model_ckpt_1_5 = "runwayml/stable-diffusion-v1-5" sd_pipeline_1_5 = StableDiffusionPipeline.from_pretrained(model_ckpt_1_5, torch_dtype=weight_dtype).to(device) -images_1_5 = sd_pipeline_1_5(prompts, num_images_per_prompt=1, generator=generator, output_type="numpy").images +images_1_5 = sd_pipeline_1_5(prompts, num_images_per_prompt=1, generator=generator, output_type="np").images ``` And finally, we compare their CLIP scores: @@ -295,12 +293,11 @@ def edit_image(input_image, instruction): image = instruct_pix2pix_pipeline( instruction, image=input_image, - output_type="numpy", + output_type="np", generator=generator, ).images[0] return image - input_images = [] original_captions = [] modified_captions = [] @@ -417,7 +414,7 @@ It should be noted that the `StableDiffusionInstructPix2PixPipeline` exposes t We can extend the idea of this metric to measure how similar the original image and edited version are. To do that, we can just do `F.cosine_similarity(img_feat_two, img_feat_one)`. For these kinds of edits, we would still want the primary semantics of the images to be preserved as much as possible, i.e., a high similarity score. -We can use these metrics for similar pipelines such as the [`StableDiffusionPix2PixZeroPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/pix2pix_zero#diffusers.StableDiffusionPix2PixZeroPipeline). +We can use these metrics for similar pipelines such as the [`StableDiffusionPix2PixZeroPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pix2pix_zero#diffusers.StableDiffusionPix2PixZeroPipeline). @@ -427,7 +424,7 @@ Both CLIP score and CLIP direction similarity rely on the CLIP model, which can ***Extending metrics like IS, FID (discussed later), or KID can be difficult*** when the model under evaluation was pre-trained on a large image-captioning dataset (such as the [LAION-5B dataset](https://laion.ai/blog/laion-5b/)). This is because underlying these metrics is an InceptionNet (pre-trained on the ImageNet-1k dataset) used for extracting intermediate image features. The pre-training dataset of Stable Diffusion may have limited overlap with the pre-training dataset of InceptionNet, so it is not a good candidate here for feature extraction. -***Using the above metrics helps evaluate models that are class-conditioned. For example, [DiT](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/overview). It was pre-trained being conditioned on the ImageNet-1k classes.*** +***Using the above metrics helps evaluate models that are class-conditioned. For example, [DiT](https://huggingface.co/docs/diffusers/main/en/api/pipelines/dit). It was pre-trained being conditioned on the ImageNet-1k classes.*** ### Class-conditioned image generation @@ -452,7 +449,6 @@ def download(url, local_filepath): f.write(r.content) return local_filepath - dummy_dataset_url = "https://hf.co/datasets/sayakpaul/sample-datasets/resolve/main/sample-imagenet-images.zip" local_filepath = download(dummy_dataset_url, dummy_dataset_url.split("/")[-1]) @@ -470,7 +466,7 @@ image_paths = sorted([os.path.join(dataset_path, x) for x in os.listdir(dataset_ real_images = [np.array(Image.open(path).convert("RGB")) for path in image_paths] ``` -These are 10 images from the following Imagenet-1k classes: "cassette_player", "chain_saw" (x2), "church", "gas_pump" (x3), "parachute" (x2), and "tench". +These are 10 images from the following ImageNet-1k classes: "cassette_player", "chain_saw" (x2), "church", "gas_pump" (x3), "parachute" (x2), and "tench".

real-images
@@ -488,7 +484,6 @@ def preprocess_image(image): image = image.permute(0, 3, 1, 2) / 255.0 return F.center_crop(image, (256, 256)) - real_images = torch.cat([preprocess_image(image) for image in real_images]) print(real_images.shape) # torch.Size([10, 3, 256, 256]) @@ -517,7 +512,7 @@ words = [ ] class_ids = dit_pipeline.get_label_ids(words) -output = dit_pipeline(class_labels=class_ids, generator=generator, output_type="numpy") +output = dit_pipeline(class_labels=class_ids, generator=generator, output_type="np") fake_images = output.images fake_images = torch.tensor(fake_images) diff --git a/docs/source/en/conceptual/philosophy.md b/docs/source/en/conceptual/philosophy.md index aed09169bfce..909ed6bc193d 100644 --- a/docs/source/en/conceptual/philosophy.md +++ b/docs/source/en/conceptual/philosophy.md @@ -22,7 +22,7 @@ In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefor ## Usability over Performance - While Diffusers has many built-in performance-enhancing features (see [Memory and Speed](https://huggingface.co/docs/diffusers/optimization/fp16)), models are always loaded with the highest precision and lowest optimization. Therefore, by default diffusion pipelines are always instantiated on CPU with float32 precision if not otherwise defined by the user. This ensures usability across different platforms and accelerators and means that no complex installations are required to run the library. -- Diffusers aim at being a **light-weight** package and therefore has very few required dependencies, but many soft dependencies that can improve performance (such as `accelerate`, `safetensors`, `onnx`, etc...). We strive to keep the library as lightweight as possible so that it can be added without much concern as a dependency on other packages. +- Diffusers aims to be a **light-weight** package and therefore has very few required dependencies, but many soft dependencies that can improve performance (such as `accelerate`, `safetensors`, `onnx`, etc...). We strive to keep the library as lightweight as possible so that it can be added without much concern as a dependency on other packages. - Diffusers prefers simple, self-explainable code over condensed, magic code. This means that short-hand code syntaxes such as lambda functions, and advanced PyTorch operators are often not desired. ## Simple over easy @@ -31,13 +31,13 @@ As PyTorch states, **explicit is better than implicit** and **simple is better t - We follow PyTorch's API with methods like [`DiffusionPipeline.to`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.to) to let the user handle device management. - Raising concise error messages is preferred to silently correct erroneous input. Diffusers aims at teaching the user, rather than making the library as easy to use as possible. - Complex model vs. scheduler logic is exposed instead of magically handled inside. Schedulers/Samplers are separated from diffusion models with minimal dependencies on each other. This forces the user to write the unrolled denoising loop. However, the separation allows for easier debugging and gives the user more control over adapting the denoising process or switching out diffusion models or schedulers. -- Separately trained components of the diffusion pipeline, *e.g.* the text encoder, the unet, and the variational autoencoder, each have their own model class. This forces the user to handle the interaction between the different model components, and the serialization format separates the model components into different files. However, this allows for easier debugging and customization. Dreambooth or textual inversion training -is very simple thanks to diffusers' ability to separate single components of the diffusion pipeline. +- Separately trained components of the diffusion pipeline, *e.g.* the text encoder, the unet, and the variational autoencoder, each have their own model class. This forces the user to handle the interaction between the different model components, and the serialization format separates the model components into different files. However, this allows for easier debugging and customization. DreamBooth or Textual Inversion training +is very simple thanks to Diffusers' ability to separate single components of the diffusion pipeline. ## Tweakable, contributor-friendly over abstraction For large parts of the library, Diffusers adopts an important design principle of the [Transformers library](https://github.com/huggingface/transformers), which is to prefer copy-pasted code over hasty abstractions. This design principle is very opinionated and stands in stark contrast to popular design principles such as [Don't repeat yourself (DRY)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself). -In short, just like Transformers does for modeling files, diffusers prefers to keep an extremely low level of abstraction and very self-contained code for pipelines and schedulers. +In short, just like Transformers does for modeling files, Diffusers prefers to keep an extremely low level of abstraction and very self-contained code for pipelines and schedulers. Functions, long code blocks, and even classes can be copied across multiple files which at first can look like a bad, sloppy design choice that makes the library unmaintainable. **However**, this design has proven to be extremely successful for Transformers and makes a lot of sense for community-driven, open-source machine learning libraries because: - Machine Learning is an extremely fast-moving field in which paradigms, model architectures, and algorithms are changing rapidly, which therefore makes it very difficult to define long-lasting code abstractions. @@ -47,15 +47,15 @@ Functions, long code blocks, and even classes can be copied across multiple file At Hugging Face, we call this design the **single-file policy** which means that almost all of the code of a certain class should be written in a single, self-contained file. To read more about the philosophy, you can have a look at [this blog post](https://huggingface.co/blog/transformers-design-philosophy). -In diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such -as [DDPM](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/ddpm), [Stable Diffusion](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/stable_diffusion/overview#stable-diffusion-pipelines), [UnCLIP (Dalle-2)](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/unclip#overview) and [Imagen](https://imagen.research.google/) all rely on the same diffusion model, the [UNet](https://huggingface.co/docs/diffusers/api/models#diffusers.UNet2DConditionModel). +In Diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such +as [DDPM](https://huggingface.co/docs/diffusers/api/pipelines/ddpm), [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview#stable-diffusion-pipelines), [unCLIP (DALL·E 2)](https://huggingface.co/docs/diffusers/api/pipelines/unclip) and [Imagen](https://imagen.research.google/) all rely on the same diffusion model, the [UNet](https://huggingface.co/docs/diffusers/api/models/unet2d-cond). Great, now you should have generally understood why 🧨 Diffusers is designed the way it is 🤗. We try to apply these design principles consistently across the library. Nevertheless, there are some minor exceptions to the philosophy or some unlucky design choices. If you have feedback regarding the design, we would ❤️ to hear it [directly on GitHub](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=). ## Design Philosophy in Details -Now, let's look a bit into the nitty-gritty details of the design philosophy. Diffusers essentially consist of three major classes, [pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines), [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models), and [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers). +Now, let's look a bit into the nitty-gritty details of the design philosophy. Diffusers essentially consists of three major classes: [pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines), [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models), and [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers). Let's walk through more in-detail design decisions for each class. ### Pipelines @@ -83,14 +83,14 @@ The following design principles are followed: - Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context. - All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_condition.py), [`transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformer_2d.py), etc... - Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy. -- Models intend to expose complexity, just like PyTorch's module does, and give clear error messages. +- Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages. - Models all inherit from `ModelMixin` and `ConfigMixin`. - Models can be optimized for performance when it doesn’t demand major code changes, keeps backward compatibility, and gives significant memory or compute gain. - Models should by default have the highest precision and lowest performance setting. - To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different. - Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work. - The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and -readable longterm, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py). +readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py). ### Schedulers @@ -99,10 +99,10 @@ Schedulers are responsible to guide the denoising process for inference as well The following design principles are followed: - All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers). - Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained. -- One scheduler python file corresponds to one scheduler algorithm (as might be defined in a paper). +- One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper). - If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism. - Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`. -- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./using-diffusers/schedulers.md). +- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](../using-diffusers/schedulers.md). - Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called. - Every scheduler exposes the timesteps to be "looped over" via a `timesteps` attribute, which is an array of timesteps the model will be called upon. - The `step(...)` function takes a predicted model output and the "current" sample (x_t) and returns the "previous", slightly more denoised sample (x_t-1). diff --git a/docs/source/en/using-diffusers/controlling_generation.md b/docs/source/en/using-diffusers/controlling_generation.md index 25e4b0d699d3..34ce0584bdbb 100644 --- a/docs/source/en/using-diffusers/controlling_generation.md +++ b/docs/source/en/using-diffusers/controlling_generation.md @@ -18,7 +18,7 @@ Most examples of preserving semantics reduce to being able to accurately map a c Additionally, there are qualities of generated images that we would like to influence beyond semantic preservation. I.e. in general, we would like our outputs to be of good quality, adhere to a particular style, or be realistic. -We will document some of the techniques `diffusers` supports to control generation of diffusion models. Much is cutting edge research and can be quite nuanced. If something needs clarifying or you have a suggestion, don't hesitate to open a discussion on the [forum](https://discuss.huggingface.co/) or a [GitHub issue](https://github.com/huggingface/diffusers/issues). +We will document some of the techniques `diffusers` supports to control generation of diffusion models. Much is cutting edge research and can be quite nuanced. If something needs clarifying or you have a suggestion, don't hesitate to open a discussion on the [forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or a [GitHub issue](https://github.com/huggingface/diffusers/issues). We provide a high level explanation of how the generation can be controlled as well as a snippet of the technicals. For more in depth explanations on the technicals, the original papers which are linked from the pipelines are always the best resources. @@ -26,11 +26,11 @@ Depending on the use case, one should choose a technique accordingly. In many ca Unless otherwise mentioned, these are techniques that work with existing models and don't require their own weights. -1. [Instruct Pix2Pix](#instruct-pix2pix) -2. [Pix2Pix Zero](#pix2pixzero) +1. [InstructPix2Pix](#instruct-pix2pix) +2. [Pix2Pix Zero](#pix2pix-zero) 3. [Attend and Excite](#attend-and-excite) -4. [Semantic Guidance](#semantic-guidance) -5. [Self-attention Guidance](#self-attention-guidance) +4. [Semantic Guidance](#semantic-guidance-sega) +5. [Self-attention Guidance](#self-attention-guidance-sag) 6. [Depth2Image](#depth2image) 7. [MultiDiffusion Panorama](#multidiffusion-panorama) 8. [DreamBooth](#dreambooth) @@ -47,11 +47,11 @@ For convenience, we provide a table to denote which methods are inference-only a | **Method** | **Inference only** | **Requires training /
fine-tuning** | **Comments** | | :-------------------------------------------------: | :----------------: | :-------------------------------------: | :---------------------------------------------------------------------------------------------: | -| [Instruct Pix2Pix](#instruct-pix2pix) | ✅ | ❌ | Can additionally be
fine-tuned for better
performance on specific
edit instructions. | -| [Pix2Pix Zero](#pix2pixzero) | ✅ | ❌ | | +| [InstructPix2Pix](#instruct-pix2pix) | ✅ | ❌ | Can additionally be
fine-tuned for better
performance on specific
edit instructions. | +| [Pix2Pix Zero](#pix2pix-zero) | ✅ | ❌ | | | [Attend and Excite](#attend-and-excite) | ✅ | ❌ | | -| [Semantic Guidance](#semantic-guidance) | ✅ | ❌ | | -| [Self-attention Guidance](#self-attention-guidance) | ✅ | ❌ | | +| [Semantic Guidance](#semantic-guidance-sega) | ✅ | ❌ | | +| [Self-attention Guidance](#self-attention-guidance-sag) | ✅ | ❌ | | | [Depth2Image](#depth2image) | ✅ | ❌ | | | [MultiDiffusion Panorama](#multidiffusion-panorama) | ✅ | ❌ | | | [DreamBooth](#dreambooth) | ❌ | ✅ | | @@ -63,14 +63,12 @@ For convenience, we provide a table to denote which methods are inference-only a | [DiffEdit](#diffedit) | ✅ | ❌ | | | [T2I-Adapter](#t2i-adapter) | ✅ | ❌ | | | [Fabric](#fabric) | ✅ | ❌ | | -## Instruct Pix2Pix +## InstructPix2Pix [Paper](https://arxiv.org/abs/2211.09800) -[Instruct Pix2Pix](../api/pipelines/pix2pix) is fine-tuned from stable diffusion to support editing input images. It takes as inputs an image and a prompt describing an edit, and it outputs the edited image. -Instruct Pix2Pix has been explicitly trained to work well with [InstructGPT](https://openai.com/blog/instruction-following/)-like prompts. - -See [here](../api/pipelines/pix2pix) for more information on how to use it. +[InstructPix2Pix](../api/pipelines/pix2pix) is fine-tuned from Stable Diffusion to support editing input images. It takes as inputs an image and a prompt describing an edit, and it outputs the edited image. +InstructPix2Pix has been explicitly trained to work well with [InstructGPT](https://openai.com/blog/instruction-following/)-like prompts. ## Pix2Pix Zero @@ -84,7 +82,7 @@ Pix2Pix Zero can be used both to edit synthetic images as well as real images. - To edit synthetic images, one first generates an image given a caption. Next, we generate image captions for the concept that shall be edited and for the new target concept. We can use a model like [Flan-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) for this purpose. Then, "mean" prompt embeddings for both the source and target concepts are created via the text encoder. Finally, the pix2pix-zero algorithm is used to edit the synthetic image. -- To edit a real image, one first generates an image caption using a model like [BLIP](https://huggingface.co/docs/transformers/model_doc/blip). Then one applies ddim inversion on the prompt and image to generate "inverse" latents. Similar to before, "mean" prompt embeddings for both source and target concepts are created and finally the pix2pix-zero algorithm in combination with the "inverse" latents is used to edit the image. +- To edit a real image, one first generates an image caption using a model like [BLIP](https://huggingface.co/docs/transformers/model_doc/blip). Then one applies DDIM inversion on the prompt and image to generate "inverse" latents. Similar to before, "mean" prompt embeddings for both source and target concepts are created and finally the pix2pix-zero algorithm in combination with the "inverse" latents is used to edit the image. @@ -96,7 +94,13 @@ can edit an image in less than a minute on a consumer GPU as shown [here](../api As mentioned above, Pix2Pix Zero includes optimizing the latents (and not any of the UNet, VAE, or the text encoder) to steer the generation toward a specific concept. This means that the overall pipeline might require more memory than a standard [StableDiffusionPipeline](../api/pipelines/stable_diffusion/text2img). -See [here](../api/pipelines/pix2pix_zero) for more information on how to use it. + + +An important distinction between methods like InstructPix2Pix and Pix2Pix Zero is that the former +involves fine-tuning the pre-trained weights while the latter does not. This means that you can +apply Pix2Pix Zero to any of the available Stable Diffusion models. + + ## Attend and Excite @@ -108,20 +112,16 @@ A set of token indices are given as input, corresponding to the subjects in the Like Pix2Pix Zero, Attend and Excite also involves a mini optimization loop (leaving the pre-trained weights untouched) in its pipeline and can require more memory than the usual [StableDiffusionPipeline](../api/pipelines/stable_diffusion/text2img). -See [here](../api/pipelines/attend_and_excite) for more information on how to use it. - ## Semantic Guidance (SEGA) [Paper](https://arxiv.org/abs/2301.12247) -SEGA allows applying or removing one or more concepts from an image. The strength of the concept can also be controlled. I.e. the smile concept can be used to incrementally increase or decrease the smile of a portrait. +[SEGA](../api/pipelines/semantic_stable_diffusion) allows applying or removing one or more concepts from an image. The strength of the concept can also be controlled. I.e. the smile concept can be used to incrementally increase or decrease the smile of a portrait. Similar to how classifier free guidance provides guidance via empty prompt inputs, SEGA provides guidance on conceptual prompts. Multiple of these conceptual prompts can be applied simultaneously. Each conceptual prompt can either add or remove their concept depending on if the guidance is applied positively or negatively. Unlike Pix2Pix Zero or Attend and Excite, SEGA directly interacts with the diffusion process instead of performing any explicit gradient-based optimization. -See [here](../api/pipelines/semantic_stable_diffusion) for more information on how to use it. - ## Self-attention Guidance (SAG) [Paper](https://arxiv.org/abs/2210.00939) @@ -130,34 +130,20 @@ See [here](../api/pipelines/semantic_stable_diffusion) for more information on h SAG provides guidance from predictions not conditioned on high-frequency details to fully conditioned images. The high frequency details are extracted out of the UNet self-attention maps. -See [here](../api/pipelines/self_attention_guidance) for more information on how to use it. - ## Depth2Image [Project](https://huggingface.co/stabilityai/stable-diffusion-2-depth) -[Depth2Image](../pipelines/stable_diffusion_2#depthtoimage) is fine-tuned from Stable Diffusion to better preserve semantics for text guided image variation. +[Depth2Image](../api/pipelines/stable_diffusion/depth2img) is fine-tuned from Stable Diffusion to better preserve semantics for text guided image variation. It conditions on a monocular depth estimate of the original image. -See [here](../api/pipelines/stable_diffusion_2#depthtoimage) for more information on how to use it. - - - -An important distinction between methods like InstructPix2Pix and Pix2Pix Zero is that the former -involves fine-tuning the pre-trained weights while the latter does not. This means that you can -apply Pix2Pix Zero to any of the available Stable Diffusion models. - - - ## MultiDiffusion Panorama [Paper](https://arxiv.org/abs/2302.08113) -MultiDiffusion defines a new generation process over a pre-trained diffusion model. This process binds together multiple diffusion generation methods that can be readily applied to generate high quality and diverse images. Results adhere to user-provided controls, such as desired aspect ratio (e.g., panorama), and spatial guiding signals, ranging from tight segmentation masks to bounding boxes. -[MultiDiffusion Panorama](../api/pipelines/panorama) allows to generate high-quality images at arbitrary aspect ratios (e.g., panoramas). - -See [here](../api/pipelines/panorama) for more information on how to use it to generate panoramic images. +[MultiDiffusion Panorama](../api/pipelines/panorama) defines a new generation process over a pre-trained diffusion model. This process binds together multiple diffusion generation methods that can be readily applied to generate high quality and diverse images. Results adhere to user-provided controls, such as desired aspect ratio (e.g., panorama), and spatial guiding signals, ranging from tight segmentation masks to bounding boxes. +MultiDiffusion Panorama allows to generate high-quality images at arbitrary aspect ratios (e.g., panoramas). ## Fine-tuning your own models @@ -165,44 +151,39 @@ In addition to pre-trained models, Diffusers has training scripts for fine-tunin ## DreamBooth -[DreamBooth](../training/dreambooth) fine-tunes a model to teach it about a new subject. I.e. a few pictures of a person can be used to generate images of that person in different styles. +[Project](https://dreambooth.github.io/) -See [here](../training/dreambooth) for more information on how to use it. +[DreamBooth](../training/dreambooth) fine-tunes a model to teach it about a new subject. I.e. a few pictures of a person can be used to generate images of that person in different styles. ## Textual Inversion -[Textual Inversion](../training/text_inversion) fine-tunes a model to teach it about a new concept. I.e. a few pictures of a style of artwork can be used to generate images in that style. +[Paper](https://arxiv.org/abs/2208.01618) -See [here](../training/text_inversion) for more information on how to use it. +[Textual Inversion](../training/text_inversion) fine-tunes a model to teach it about a new concept. I.e. a few pictures of a style of artwork can be used to generate images in that style. ## ControlNet [Paper](https://arxiv.org/abs/2302.05543) -[ControlNet](../api/pipelines/controlnet) is an auxiliary network which adds an extra condition. [ControlNet](../api/pipelines/controlnet) is an auxiliary network which adds an extra condition. There are 8 canonical pre-trained ControlNets trained on different conditionings such as edge detection, scribbles, depth maps, and semantic segmentations. -See [here](../api/pipelines/controlnet) for more information on how to use it. - ## Prompt Weighting -Prompt weighting is a simple technique that puts more attention weight on certain parts of the text +[Prompt weighting](../using-diffusers/weighted_prompts) is a simple technique that puts more attention weight on certain parts of the text input. -For a more in-detail explanation and examples, see [here](../using-diffusers/weighted_prompts). - ## Custom Diffusion +[Paper](https://arxiv.org/abs/2212.04488) + [Custom Diffusion](../training/custom_diffusion) only fine-tunes the cross-attention maps of a pre-trained -text-to-image diffusion model. It also allows for additionally performing textual inversion. It supports +text-to-image diffusion model. It also allows for additionally performing Textual Inversion. It supports multi-concept training by design. Like DreamBooth and Textual Inversion, Custom Diffusion is also used to teach a pre-trained text-to-image diffusion model about new concepts to generate outputs involving the concept(s) of interest. -For more details, check out our [official doc](../training/custom_diffusion). - ## Model Editing [Paper](https://arxiv.org/abs/2303.08084) @@ -211,8 +192,6 @@ The [text-to-image model editing pipeline](../api/pipelines/model_editing) helps diffusion model might make about the subjects present in the input prompt. For example, if you prompt Stable Diffusion to generate images for "A pack of roses", the roses in the generated images are more likely to be red. This pipeline helps you change that assumption. -To know more details, check out the [official doc](../api/pipelines/model_editing). - ## DiffEdit [Paper](https://arxiv.org/abs/2210.11427) @@ -220,8 +199,6 @@ To know more details, check out the [official doc](../api/pipelines/model_editin [DiffEdit](../api/pipelines/diffedit) allows for semantic editing of input images along with input prompts while preserving the original input images as much as possible. -To know more details, check out the [official doc](../api/pipelines/diffedit). - ## T2I-Adapter [Paper](https://arxiv.org/abs/2302.08453) @@ -230,15 +207,11 @@ To know more details, check out the [official doc](../api/pipelines/diffedit). There are 8 canonical pre-trained adapters trained on different conditionings such as edge detection, sketch, depth maps, and semantic segmentations. -See [here](../api/pipelines/stable_diffusion/adapter) for more information on how to use it. - ## Fabric [Paper](https://arxiv.org/abs/2307.10159) -[Fabric](../api/pipelines/fabric) is a training-free +[Fabric](https://github.com/huggingface/diffusers/tree/442017ccc877279bcf24fbe92f92d3d0def191b6/examples/community#stable-diffusion-fabric-pipeline) is a training-free approach applicable to a wide range of popular diffusion models, which exploits the self-attention layer present in the most widely used architectures to condition the diffusion process on a set of feedback images. - -To know more details, check out the [official doc](../api/pipelines/fabric).