Skip to content

Latest commit

 

History

History
323 lines (242 loc) · 10.6 KB

README_en.md

File metadata and controls

323 lines (242 loc) · 10.6 KB

Dump U-Net

Table of contents

What is this

This is an extension for stable-diffusion-webui that adds a custom script which let you to observe U-Net feature maps.

What can this

This extension can

  1. visualize intermediate output of the model: features of each block of U-Net and attention layer.
  2. per-block prompts: generate images changing the prompt in each block of U-Net.
  3. visualize the difference of features in 2.

Feature extraction

Use the image below as an example.

Model Output Image

Model: waifu-diffusion-v1-3-float16 (84692140)
Prompt: a cute girl, pink hair
Sampling steps: 20
Sampling Method: DPM++ 2M Karras
Size: 512x512
CFG Scale: 7
Seed: 1719471015

Feature extraction from U-Net

For example, the following images are generated.

Grayscale output OUT11, steps 20, Black/White, Sigmoid(1,0)

Colored output OUT11, steps 20, Custom, Sigmoid(1,0), H=(2+v)/3, S=1.0, V=0.5

UI description

Extract U-Net features
If checked, U-Net feature extraction is enabled.
Layers
Specify blocks to be extracted. Comma delimiters and hyphen delimiters can be used. IN11, M00 and OUT00 are connected.
Image saving steps
Specify the steps processing extraction.
Colorization
Specify how colorize the output images.
Dump Setting
Configure "binary-dump" settings.
Selected Layer Info
Details of the block input/output specified in Layer section.

In Layer section you can use the grammer below:

single block: IN00
    You can use IN00, IN01, ...,  IN11, M00, OUT00, OUT01 ..., OUT11.
multiple blocks: IN00, IN01, M00
    Comma separated block names.
range: IN00-OUT11
    Hyphen separated block names.
    Edges are included in the range.
    IN11, M00 and OUT00 are connected.
range with steps: IN00-OUT11(+2)
    `(+digits)` after the range defines steps.
    `+1` is same as normal range.
    `+2` means "every other block".
    For instance, `IN00-OUT11(+2)` means:
      IN00, IN02, IN04, IN06, IN08, IN10,
      M00,
      OUT01, OUT03, OUT05, OUT07, OUT09, OUT11

Colorization

Colorize method
Specifies the colorization method.
Let v be the feature value.
White/Black shows white pixel for large |v|, black pixel for small |v|.
Red/Blue shows red pixel for large v, blue pixel for small |v|.
Custom computes the color from v. You can use RGB or HSL colorspace.
Value transform
Feature values are not suitable to be used as-is to specify colors. This section specifies the conversion method from feature values to pixel values.
Auto [0,1] converts the value to [0,1] linearly using the minimum and maximum values of given feature values.
Auto [-1,1] converts the value to [-1,1] as well.
Linear first clamps feature values to specified Clamp min./max. range. Then linearly converts values to [0,1] when Colorize method is White/Black and to [-1,1] otherwise.

Sigmoid is a sigmoid function with specified gain and x-offset. The output is in range [0,1] when Colorize method is White/Black, and [-1,1] otherwise.
Color space
Write code to convert v transformed by Value transform to the pixel value, where v is given as [0,1] or [-1,1] according to Colorize method and Value transform. The result is clipped at [0,1].
The code is executed with numpy module as the global environment. For example, abs(v) means numpy.abs(v).

Dump Setting

Dump feature tensors to files
If checked, U-Net feature tensors are exported as files.
Output path
Specify the directory to output binaries. If it does not exist, it will be created.

Examples of extracted images

Images with steps=1,5,10 from left to right.

  • IN00 (64x64, 320ch) IN00

  • IN05 (32x32, 640ch) IN05

  • M00 (8x8, 1280ch) M00

  • OUT06 (32x32, 640ch) OUT06

  • OUT11 (64x64, 320ch) OUT11

Feature extraction from Attention layer

UI description

Same as Feature extraction from U-Net.

Examples

The horizontal axis represents the token position. The beginning token and ending token are inserted, so the 75 images in between represent the influence of each token.

The vertical axis represents the heads of the attention layer. In the current model, h=8, so there will be 8 images in a row.

"It seems pink hair is working on this layer..." Something like that can be seen.

  • IN01 Attention IN01

  • M00 Attention M00

  • OUT10 Attention OUt10

Per-block Prompts

Overview

See the following article for content (Japanese lang).

Generating images with different prompts for each block in Stable Diffusion's U-Net (block-specific prompts)

Example of Difference map

Example of Difference map

Example of Difference map

Example of Difference map

Model: waifu-diffusion-v1-3-float16 (84692140)
Prompt: a (~: IN00-OUT11: cute; M00: excellent :~) girl
Sampling Method: Euler a
Size: 512x512
CFG Scale: 7
Seed: 3292581281

The above images are in order:

  • generated by a cute girl.
  • with cute changed to excellent in IN00
  • with cute changed to excellent in IN05
  • with cute changed to excellent in M00

UI description

Same as Feature extraction from U-Net

Output difference map of U-Net features between with and without Layer Prompt
Add outputs to an image which shows difference between per-block prompt disabled and enabled.

Notation

Use notation below in the prompt:

a (~: IN00-OUT11: cute ; M00: excellent :~) girl

In above case, IN00-OUT11 (i.e. whole generation process) use

a  cute  girl

but for M00

a  excellent  girl

You can specify per-block prompts with the grammer below:

(~:
    block-spec:prompt;
    block-spec:prompt;
    ...
    block-spec:prompt;
:~)

After (~:, before :~), before :, and after ;, you may insert spaces. Note that the :prompt; is reflected in the result as it is with spaces. The semicolon after the last prompt may be omitted.

The block specification (block-spec above) is as follows. Generally, it is the same as X/Y plot. If there are overlapping ranges, the later one takes precedence.

single block: IN00
    You can use IN00, IN01, ...,  IN11, M00, OUT00, OUT01 ..., OUT11.
multiple blocks: IN00, IN01, M00
    Comma separated block names.
range: IN00-OUT11
    Hyphen separated block names.
    Edges are included in the range.
    IN11, M00 and OUT00 are connected.
range with steps: IN00-OUT11(+2)
    `(+digits)` after the range defines steps.
    `+1` is same as normal range.
    `+2` means "every other block".
    For instance, `IN00-OUT11(+2)` means:
      IN00, IN02, IN04, IN06, IN08, IN10,
      M00,
      OUT01, OUT03, OUT05, OUT07, OUT09, OUT11
otherwise: _ (underbar)
    This is a special symbol and has the lowest precedence.
    If any other block specs are matched, the prompt defined here will be used.

Examples

A few examaples.

1: (~: IN00: A ; IN01: B :~)
2: (~: IN00: A ; IN01: B ; IN02: C :~)
3: (~: IN00: A ; IN01: B ; IN02: C ; _ : D :~)
4: (~: IN00,IN01: A ; M00 : B :~)
5: (~: IN00-OUT11: A ; M00 : B :~)

1: use A in IN00, B in IN01, and nothing in other blocks. 2: use A in IN00, B in IN01, C in IN02 and nothing in other blocks. 3: use A in IN00, B in IN01, C in IN02 and D in other blocks. 4: use A in IN00 and IN01, B in M00, and nothing in other blocks. 5: use A in from IN00 to OUT11 (all blocks), but B for M00.

Use with Dynamic Prompts

For experiments, Dynamic Prompts is useful.

For instance, if you want to see the effect of changing the prompt in only one block, enable Jinja Template in Dynamic Prompts and input the following prompt:

{% for layer in [ "IN00", "IN01", "IN02", "IN03", "IN04", "IN05", "IN06", "IN07", "IN08", "IN09", "IN10", "IN11", "M00", "OUT00", "OUT01", "OUT02", "OUT03", "OUT04", "OUT05", "OUT06", "OUT07", "OUT08", "OUT09", "OUT10", "OUT11" ] %}
  {% prompt %}a cute school girl, pink hair, wide shot, (~:{{layer}}:bad anatomy:~){% endprompt %}
{% endfor %}

to check the effect of bad anatomy in each block.

Actual examples are here (Japasese lang).

Test adding prompts to one specific block with prompts by block

TODO

  • visualize self-attention layer