This is an extension for stable-diffusion-webui that adds a custom script which let you to observe U-Net feature maps.
This extension can
- visualize intermediate output of the model: features of each block of U-Net and attention layer.
- per-block prompts: generate images changing the prompt in each block of U-Net.
- visualize the difference of features in 2.
Use the image below as an example.
Model: waifu-diffusion-v1-3-float16 (84692140)
Prompt: a cute girl, pink hair
Sampling steps: 20
Sampling Method: DPM++ 2M Karras
Size: 512x512
CFG Scale: 7
Seed: 1719471015
For example, the following images are generated.
Grayscale output OUT11, steps 20, Black/White, Sigmoid(1,0)
Colored output OUT11, steps 20, Custom, Sigmoid(1,0), H=(2+v)/3, S=1.0, V=0.5
- Extract U-Net features
- If checked, U-Net feature extraction is enabled.
- Layers
- Specify blocks to be extracted. Comma delimiters and hyphen delimiters can be used.
IN11
,M00
andOUT00
are connected. - Image saving steps
- Specify the steps processing extraction.
- Colorization
- Specify how colorize the output images.
- Dump Setting
- Configure "binary-dump" settings.
- Selected Layer Info
- Details of the block input/output specified in
Layer
section.
In Layer
section you can use the grammer below:
single block: IN00
You can use IN00, IN01, ..., IN11, M00, OUT00, OUT01 ..., OUT11.
multiple blocks: IN00, IN01, M00
Comma separated block names.
range: IN00-OUT11
Hyphen separated block names.
Edges are included in the range.
IN11, M00 and OUT00 are connected.
range with steps: IN00-OUT11(+2)
`(+digits)` after the range defines steps.
`+1` is same as normal range.
`+2` means "every other block".
For instance, `IN00-OUT11(+2)` means:
IN00, IN02, IN04, IN06, IN08, IN10,
M00,
OUT01, OUT03, OUT05, OUT07, OUT09, OUT11
- Colorize method
- Specifies the colorization method.
Letv
be the feature value.
White/Black
shows white pixel for large|v|
, black pixel for small|v|
.
Red/Blue
shows red pixel for largev
, blue pixel for small|v|
.
Custom
computes the color fromv
. You can use RGB or HSL colorspace. - Value transform
-
Feature values are not suitable to be used as-is to specify colors. This section specifies the conversion method from feature values to pixel values.
Auto [0,1]
converts the value to[0,1]
linearly using the minimum and maximum values of given feature values.
Auto [-1,1]
converts the value to[-1,1]
as well.
Linear
first clamps feature values to specifiedClamp min./max.
range. Then linearly converts values to[0,1]
whenColorize method
is White/Black and to [-1,1] otherwise.
Sigmoid
is a sigmoid function with specified gain and x-offset. The output is in range[0,1]
whenColorize method
isWhite/Black
, and[-1,1]
otherwise.
- Color space
- Write code to convert
v
transformed byValue transform
to the pixel value, wherev
is given as[0,1]
or[-1,1]
according toColorize method
andValue transform
. The result is clipped at[0,1]
.
The code is executed withnumpy
module as the global environment. For example,abs(v)
meansnumpy.abs(v)
.
- Dump feature tensors to files
- If checked, U-Net feature tensors are exported as files.
- Output path
- Specify the directory to output binaries. If it does not exist, it will be created.
Images with steps=1,5,10
from left to right.
Same as Feature extraction from U-Net.
The horizontal axis represents the token position. The beginning token and ending token are inserted, so the 75 images in between represent the influence of each token.
The vertical axis represents the heads of the attention layer. In the current model, h=8
, so there will be 8 images in a row.
"It seems pink hair
is working on this layer..." Something like that can be seen.
See the following article for content (Japanese lang).
Model: waifu-diffusion-v1-3-float16 (84692140)
Prompt: a (~: IN00-OUT11: cute; M00: excellent :~) girl
Sampling Method: Euler a
Size: 512x512
CFG Scale: 7
Seed: 3292581281
The above images are in order:
- generated by
a cute girl
. - with cute changed to excellent in IN00
- with cute changed to excellent in IN05
- with cute changed to excellent in M00
Same as Feature extraction from U-Net
- Output difference map of U-Net features between with and without Layer Prompt
- Add outputs to an image which shows difference between per-block prompt disabled and enabled.
Use notation below in the prompt:
a (~: IN00-OUT11: cute ; M00: excellent :~) girl
In above case, IN00-OUT11 (i.e. whole generation process) use
a cute girl
but for M00
a excellent girl
You can specify per-block prompts with the grammer below:
(~:
block-spec:prompt;
block-spec:prompt;
...
block-spec:prompt;
:~)
After (~:
, before :~)
, before :
, and after ;
, you may insert spaces. Note that the :prompt;
is reflected in the result as it is with spaces. The semicolon after the last prompt may be omitted.
The block specification (block-spec
above) is as follows. Generally, it is the same as X/Y plot. If there are overlapping ranges, the later one takes precedence.
single block: IN00
You can use IN00, IN01, ..., IN11, M00, OUT00, OUT01 ..., OUT11.
multiple blocks: IN00, IN01, M00
Comma separated block names.
range: IN00-OUT11
Hyphen separated block names.
Edges are included in the range.
IN11, M00 and OUT00 are connected.
range with steps: IN00-OUT11(+2)
`(+digits)` after the range defines steps.
`+1` is same as normal range.
`+2` means "every other block".
For instance, `IN00-OUT11(+2)` means:
IN00, IN02, IN04, IN06, IN08, IN10,
M00,
OUT01, OUT03, OUT05, OUT07, OUT09, OUT11
otherwise: _ (underbar)
This is a special symbol and has the lowest precedence.
If any other block specs are matched, the prompt defined here will be used.
A few examaples.
1: (~: IN00: A ; IN01: B :~)
2: (~: IN00: A ; IN01: B ; IN02: C :~)
3: (~: IN00: A ; IN01: B ; IN02: C ; _ : D :~)
4: (~: IN00,IN01: A ; M00 : B :~)
5: (~: IN00-OUT11: A ; M00 : B :~)
1: use A in IN00, B in IN01, and nothing in other blocks. 2: use A in IN00, B in IN01, C in IN02 and nothing in other blocks. 3: use A in IN00, B in IN01, C in IN02 and D in other blocks. 4: use A in IN00 and IN01, B in M00, and nothing in other blocks. 5: use A in from IN00 to OUT11 (all blocks), but B for M00.
For experiments, Dynamic Prompts is useful.
For instance, if you want to see the effect of changing the prompt in only one block, enable Jinja Template in Dynamic Prompts and input the following prompt:
{% for layer in [ "IN00", "IN01", "IN02", "IN03", "IN04", "IN05", "IN06", "IN07", "IN08", "IN09", "IN10", "IN11", "M00", "OUT00", "OUT01", "OUT02", "OUT03", "OUT04", "OUT05", "OUT06", "OUT07", "OUT08", "OUT09", "OUT10", "OUT11" ] %}
{% prompt %}a cute school girl, pink hair, wide shot, (~:{{layer}}:bad anatomy:~){% endprompt %}
{% endfor %}
to check the effect of bad anatomy
in each block.
Actual examples are here (Japasese lang).
Test adding prompts to one specific block with prompts by block
- visualize self-attention layer