New Training Method : Masked Score Estimation #96

cloneofsimo · 2022-12-29T14:28:11Z

cloneofsimo
Dec 29, 2022
Maintainer

Basically copy-paste from my twitter thread, but sharing here as well:
Consider yourself fine-tuning on the following two images. As a neural network, you have no idea if blue cloth is something intrinsic to . But what if you modified the score estimation objective to only consider the region of interest : in our case, facial features?

Explicitly, we can define the region of interest as some appropriate mask function given image x_t. In my experiments, I've used result from face recognition model, and scaled them to make them have equal importance.

$$
\begin{align}
L &= \mathbb{E}{t~[1, \lambda T]}||(\epsilon_t - \epsilon\theta (x_t , t, c))\cdot M(x_t)||^2 \
M(x_t) &= \frac{S(x_t) + \delta}{s(x_t)} \
\lambda& \in [0,1]
\end{align}
$$

One way to make sense of this simple trick, is to see it as "identification" (or projection) with regards to equivalence relation : If the region of interest is same, they are the same. Here is the ablation study, Left 4 pictures contain "school-uniformness" and "blueness" on their cloth, whereas right 4 pictures have non of them. seed from 0 to 3.

Here are results with random prompts from lexica, NOT CHERRYPICKED.

Currently merged into #88 .

brian6091 · 2022-12-29T14:55:17Z

brian6091
Dec 29, 2022
Collaborator

That's a great idea. Would be nice to build this into a general augmentation pipeline, or even to allow user-defined masks. I wonder if we could apply it to finetune the inpainting model?

2 replies

cloneofsimo Dec 29, 2022
Maintainer Author

Yeah, "face recognition module" is currently attached to the dataset themselves, but I think we can simply make it to have their own pipeline. I think it will work with inpainting as well.

torealise Apr 6, 2023

Is it already implemented? Is it possible to train LoRA with mask? For example I have a woman in red t-shirt and blue jeans and I want to train LoRA so it remember only red t-shirt (using mask)?

brian6091 · 2022-12-29T21:32:37Z

brian6091
Dec 29, 2022
Collaborator

Also wonder how much of this could be acheived with captioning. E.g., "TOKEN wearing blue and black striped jacket" to get the model to dissociate those features from the target token.

1 reply

torealise Apr 6, 2023

So is there an answer?

GLivshits · 2023-01-10T15:52:25Z

GLivshits
Jan 10, 2023

Can you please describe in details the second formula (for M)? What are S and delta, and how to get them?

2 replies

cloneofsimo Jan 10, 2023
Maintainer Author

You can check out the code if you are interested, it is just very general formulation. In my case, S is the mask function + gaussian blur , so 1 where there is a face and 0 otherwise. s is the sum of the S, so it is basically normalizing term (sometimes head size is different for all images, so I put it there) delta is just constant to make sure that not all backgrounds are entirely ignored. If you read the code, it will make sense much faster.

andupotorac May 31, 2024

So if one wants to do the reverse, train on everything else but the face, they would swap the 0 and the 1? This would be useful for example when training on a style and not wanting to train on specific faces used in that style.

cloneofsimo · 2023-01-11T15:37:49Z

cloneofsimo
Jan 11, 2023
Maintainer Author

Throwing bunch of results I've shared on Twitter just for reference. All used face segmentation

1 reply

runner22k Feb 6, 2023

did anyone observe the difference in Bottom three faces? actual photo(1) and the output (3) right image doesn't look same. It looks odd for some reason. Can LoRA achieve perfection? How much training is needed to reach dreambooth quality? Sorry I haven't used LoRA much.

krahnikblis · 2023-01-14T14:08:03Z

krahnikblis
Jan 14, 2023

for a work project i messed around with this (which is JAX but based on this which is pytorch if you are sticking with that library) to demonstrate how attention models apply to product photography in digital advertising (we found a correlation with click-to-view rates and ROI vs the layout and concentration of these attention maps)...

i think it would probably be somewhat helpful here as well, both for visualizing what the attention layers are "looking at", as well as to auto-define them as masks (i bet you could combine the concept with CLIP, such that the self-attention layers "focus" on the image areas matching a text prompt)

3 replies

brian6091 Jan 14, 2023
Collaborator

That's cool. Completely agree about CLIP, especially when I saw the they got CLIPSeg in transformers now. I've been playing with a kind of soft-prompting, where instead of captioning an image with just one, I draw randomly from several (up to five depending on how much trouble I'm having using a single caption to describe an image), and I think it would be cool to use CLIPSeg to automask the images

cloneofsimo Jan 15, 2023
Maintainer Author

https://github.com/7eu7d7/DreamArtist-sd-webui-extension#attention-mask might be relevant!

krahnikblis Jan 21, 2023

actually, as soon as you think of a thing, someone else already published a paper on it. this is a LOT of fun, especially to help understand what the attention heads/layers are doing, word by word. https://github.com/castorini/daam - check the colab they link, it has a neat per-head viz option too, in addition to per-word. i altered that a little to get per-word, per-head outputs:

with daam.trace(model) as trc:
    output_image = model(**settings).images[0]
    heads = len(trc.all_heat_maps.heads())
    display(output_image)
    for word in prompt.split(" "):
        word = re.sub(r'[^\w]', '', word)
        fig, ax = make_im_subplots(math.ceil(heads / 4), 4)
        fig.suptitle(word)
        for head_idx in range(heads):
            ax_ = ax[head_idx // 4, head_idx % 4]
            global_heat_map = trc.compute_global_heat_map(head_idx=head_idx, layer_idx=2, normalize=True) # why layer_idx=2,? # cuz q=0, k=1, v=2, o=3 ?
            word_heat_map = global_heat_map.compute_word_heat_map(word)
            word_heat_map.plot_overlay(output_image, ax=ax_)
            ax_.set_title(f'Head {head_idx}')
        fig.show()

xuyang06 · 2023-01-30T06:46:44Z

xuyang06
Jan 30, 2023

Awesome idea! Do we have any plan to implement this into some stable diffusion webui plugin, e.g., https://github.com/d8ahazard/sd_dreambooth_extension ?

0 replies

runner22k · 2023-02-06T16:30:33Z

runner22k
Feb 6, 2023

Great. will this help in speeding up the training process or Memory usage or Quality of the training or all?

0 replies

ima9ine · 2023-02-14T07:38:48Z

ima9ine
Feb 14, 2023

Hi! I have a question regarding masking training,
I just checked a function loss_step of cli_lora_pti.py
I just want to check that you used same loss_step with masked loss calculation not only for Tuning but also for "PIT".
I agree idea using masking for PIT since it gonna help localizing.
Thank you for great work!

0 replies

landmann · 2024-04-17T14:40:54Z

landmann
Apr 17, 2024

Hi all! Anyone can point me at what process was used to do this? I love the idea behind it - I'd love to implement it.

Also, why do we have to add the mask latents onto the model_pred input? Haha, a lot to parse through...

#271 (comment)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Training Method : Masked Score Estimation #96

{{title}}

Replies: 9 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

New Training Method : Masked Score Estimation #96

cloneofsimo Dec 29, 2022 Maintainer

Replies: 9 comments · 9 replies

brian6091 Dec 29, 2022 Collaborator

cloneofsimo Dec 29, 2022 Maintainer Author

brian6091 Dec 29, 2022 Collaborator

cloneofsimo Jan 10, 2023 Maintainer Author

cloneofsimo Jan 11, 2023 Maintainer Author

brian6091 Jan 14, 2023 Collaborator

cloneofsimo Jan 15, 2023 Maintainer Author

cloneofsimo
Dec 29, 2022
Maintainer

Replies: 9 comments 9 replies

brian6091
Dec 29, 2022
Collaborator

cloneofsimo Dec 29, 2022
Maintainer Author

brian6091
Dec 29, 2022
Collaborator

cloneofsimo Jan 10, 2023
Maintainer Author

cloneofsimo
Jan 11, 2023
Maintainer Author

brian6091 Jan 14, 2023
Collaborator

cloneofsimo Jan 15, 2023
Maintainer Author