Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implementation of LyCORIS LoKr (KronA-like adapter) for SD&SDXL models #978

Merged
merged 37 commits into from
Oct 30, 2023

Conversation

kovalexal
Copy link
Contributor

@kovalexal kovalexal commented Sep 29, 2023

This PR focuses on increasing compatibility of SD&SDXL adapters in peft with other open-source instruments like LyCORIS. Feel free to learn more about LyCORIS adapters from resources like this.

This specific PR is currently aimed at adding proper compatibility of PEFT with LoKr adapters. LoKr adapters are described in detail in LyCORIS paper and are inspired by KronA paper.

Currently, it's a draft PR, so the following things are needed:

  • Actual implementation of LoKr
  • Unit tests
  • Conversion script for SD&SDXL for kohya_ss / LyCORIS trained LoKrs
  • Sample training script for SD / SDXL / SD&SDXL

It would be great to take into account that LoKr may reuse a lot of code from LoHa #956.

@pacman100 @BenjaminBossan @younesbelkada FYI

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@BenjaminBossan
Copy link
Member

Not quite sure what the status is, is the PR ready for review or are you still working on it? Let us know when it is ready.

@kovalexal
Copy link
Contributor Author

kovalexal commented Oct 4, 2023

@BenjaminBossan hi!

I've just finished LoKr implementation. As expected, this adapter is quite similar to LoHa, but from my point of view, it is quite hard to reuse LoHa code for LoKr to be easier to maintain.

Currently, the code is working fine, we can train / convert / infer existing LoKr checkpoints for SD from civitai.com.

I've also refactored the script for SD dreambooth training and incorporated subparsers to simplify training with different adapters (lora, loha, lokr). Here are the results from my sample training which shows that this LoKr implementation is able to train (the output adapter weight is several megabytes in size with requested rank 32):

Screenshot 2023-10-04 at 14 24 17

I used the following settings for training:

python train_dreambooth.py \
--pretrained_model_name_or_path=... \
--instance_data_dir=... \
--instance_prompt="AlexanderKovalchuk" \
--output_dir=... \
--seed=42 \
--resolution=512 \
--train_text_encoder \
--train_batch_size=2 \
--max_train_steps=1000 \
--learning_rate=1e-4 \
--num_validation_images=4 \
--validation_steps=50 \
--validation_prompt="AlexanderKovalchuk" \
--logging_dir=./output_logs \
--report_to=tensorboard \
--lr_warmup_steps=300 \
--lr_scheduler=constant_with_warmup \
lokr --unet_r=32 --unet_alpha=32 --te_r=32 --te_alpha=32 --unet_use_effective_conv2d --unet_decompose_both --te_decompose_both

@kovalexal kovalexal marked this pull request as ready for review October 4, 2023 16:16
@kovalexal
Copy link
Contributor Author

kovalexal commented Oct 4, 2023

We may need to update the docs for LoHa in LoKr in the following PRs, from my point of view LoKr is a bit tricky.

  1. At first, LoKr decomposes $\Delta W$ into two matrices $\Delta W_1$ and $\Delta W_2$ so that:
    $\Delta W = \Delta W_1 \otimes \Delta W_2$
    $\Delta W \in \mathbb{R}^{N \times M}, \Delta W_1 \in \mathbb{R}^{N_1 \times M_1}, \Delta W_2 \in \mathbb{R}^{N_2 \times M_2}$
    $N = N_1 N_2, M = M_1 M_2, N_1 < N_2, M_1 < M_2$

    There may exist multiple decompositions and decompose_factor controls how this decomposition is performed. If -1 is passed, it will try to decompose sizes to be close to square root of the dimension, otherwise it will try to find the closes decomposition to the number you've provided. In general, the first matrix is smaller than the second one.

  2. Then the resulting matrices $\Delta W_1$ and $\Delta W_2$ are decomposed using the provided rank with the following logic:

    • In general, $\Delta W_1$ is not decomposed (because it often has a small size) unless requested with decompose_both
    • If the requested rank is lower than the smallest dimension of $\Delta W_{i}$, it is decomposed to two matrices $\Delta W_{ia}$ and $\Delta W_{ib}$, $\Delta W_{ia} \in \mathbb{R}^{X \times r}$, $\Delta W_{ib} \in \mathbb{R}^{r \times Y}$
    • If the requested rank is bigger than the smallest dimension of $\Delta W_{i}$, matrix is not decomposed
    • For Conv2d 3x3 layer $\Delta W_2$ we may perform Tucker decomposition (as we do it in LoHa - use_effective_conv2d)

@kovalexal
Copy link
Contributor Author

@BenjaminBossan I've also spotted several errors in convert_sd_adapter_to_peft.py script, so I fixed them in this PR.

@kovalexal
Copy link
Contributor Author

@BenjaminBossan, would you have some time to review this PR?

@BenjaminBossan
Copy link
Member

Sorry for the delay, the last few days were very busy. I'll try to give a review today or in the next few days.

@BenjaminBossan
Copy link
Member

I didn't have time for a full review yet (hopefully tomorrow), but did take a quick look. As always from you, it already looks quite excellent. Great attention to detail and thanks also for including tests.

I saw that the LoKrLayer does not yet have the changes from #979, could you please update it?

When it comes to code duplication, when it comes to layer.py, there are quite a few differences between this and LoHa, so I think there is no reasonable way to abstract here. When it comes to model.py, however, we might have a chance. The main differences I could spot where some changed variable names in code/docstrings. Do you think we have a chance of re-using a base model class here?

@kovalexal
Copy link
Contributor Author

@BenjaminBossan, hi!

I updated LoKrLayer according to #979.

I've also heavily refactored LoHaModel and LoKrModel - feel free to share your thoughts about the current implementation.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally I got around to giving this a proper review. Thanks for your patience.

This looks already quite good. Big thanks also for working on the abstractions, it's really nice to see how much code duplication can be avoided that way. I wonder if we can extend it to encompass LoRA as well (of course that would be a separate PR).

When it comes to the actual implementation of the LoKr method, I honestly haven't checked the details, so I'd trust you and the pre-existing code here to do the right thing.

Regarding the name of LyCORISConfig etc.: I could see that this may be confusing to some, though I understand the choice. From a pure spelling perspective, I wonder, however, if we should switch to LycorisConfig etc. This is much easier to type and also more consistent with other parts of PEFT (LoraConfig, not LoRAConfig). IMHO, it's also easier to read (Lycoris-Layer, not LyCORI-SLayer).

Regarding the update to train_dreambooth.py, which can now also do LoHa, does it mean that train_dreambooth_loha.py can be deleted?

examples/stable_diffusion/convert_sd_adapter_to_peft.py Outdated Show resolved Hide resolved
examples/stable_diffusion/convert_sd_adapter_to_peft.py Outdated Show resolved Hide resolved
examples/stable_diffusion/convert_sd_adapter_to_peft.py Outdated Show resolved Hide resolved
src/peft/tuners/lokr/layer.py Outdated Show resolved Hide resolved
src/peft/tuners/lycoris_utils.py Outdated Show resolved Hide resolved
...

def forward(self, x: torch.Tensor) -> torch.Tensor:
previous_dtype = x.dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need adjusting, depending on whether #1010 is merged.

src/peft/tuners/lycoris_utils.py Show resolved Hide resolved
src/peft/tuners/lycoris_utils.py Outdated Show resolved Hide resolved
tests/test_custom_models.py Show resolved Hide resolved
@kovalexal
Copy link
Contributor Author

kovalexal commented Oct 13, 2023

@BenjaminBossan, thank you for giving a review on this PR!

I wonder if we can extend it to encompass LoRA as well (of course that would be a separate PR).

I suppose that there is no problem in incorporating LoRA into this config, we can do it in a separate PR (taking into account, that we'll have to add/modify additional dropout strategies from LyCORIS).

Regarding the name of LyCORISConfig etc.: I could see that this may be confusing to some, though I understand the choice.

No problem, I agree with that. Don't you mind if I also refactor LoHa / LoKr using the same naming strategy (in this PR)? LoHa adapter was not released yet, so it won't be a problem as far as I think.

Regarding the update to train_dreambooth.py, which can now also do LoHa, does it mean that train_dreambooth_loha.py can be deleted?

Definitely, I'll remove it in the following commits. (UPD I've actually already refactored it to train_dreambooth.py)

@kovalexal
Copy link
Contributor Author

@BenjaminBossan, I addressed most of the comments you've left.

The reason is that we aim at having a release soon-ish once we return from the PT conference. That's probably too soon for this PR to be finished

I have one idea, but I am not sure, whether you will find it useful and appropriate, or not. I had a thought that we may showcase that PEFT is able to load most of the adapters for Stable Diffusion from civitai.com through a blog post on the Hugging Face blog (as far as I know, PEFT is the only way to achieve this with Hugging Face ecosystem right now). We may collaborate and co-author and I am sure that it would help to attract more users of SD/SDXL to your library.

In case you see it useful as well it probably would be great to do it after the LoKr adapter finally gets into release.

@BenjaminBossan
Copy link
Member

I have one idea, but I am not sure, whether you will find it useful and appropriate, or not. I had a thought that we may showcase that PEFT is able to load most of the adapters for Stable Diffusion from civitai.com through a blog post on the Hugging Face blog (as far as I know, PEFT is the only way to achieve this with Hugging Face ecosystem right now). We may collaborate and co-author and I am sure that it would help to attract more users of SD/SDXL to your library.

I'm personally not an expert on SD and the ecosystem, but I'm sure we can manage something. Let me discuss this with my colleagues.

Meanwhile, could you please fix the merge conflicts and then I can give a (hopefully) final review.

@kovalexal
Copy link
Contributor Author

Meanwhile, could you please fix the merge conflicts and then I can give a (hopefully) final review.

Sure, I fixed all the merge conflicts.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I did another review and it seems like we're almost done. I have some minor comments left, please take a look.

src/peft/tuners/lycoris_utils.py Outdated Show resolved Hide resolved
src/peft/tuners/lokr/layer.py Outdated Show resolved Hide resolved
r"""
A base config for LyCORIS like adapters
"""
rank_pattern: Optional[dict] = field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In LohaConfig and LoKrConfig, the arguments rank_pattern and alpha_pattern are also defined. Does it make sense to have them in parent class and child class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I just wanted to leave them explicit in case anyone would read the code, but from the other point it may lead to some errors in case of refactoring or some other modifications, so let's remove them from child classes.

src/peft/tuners/lycoris_utils.py Outdated Show resolved Hide resolved
self.weight.data += self.get_delta_weight(active_adapter)
self.merged_adapters.append(active_adapter)

def reset_adapter_parameters(self, adapter_name: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed as an abstract method? The abstract class does not require it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be an abstract method, so all child adapters need to implement them in their own way.
I missed to explicitly state that these methods should be abstract, so thank you for bringing it up!

src/peft/tuners/lycoris_utils.py Outdated Show resolved Hide resolved
r = self.r[active_adapter]
self.scaling[active_adapter] = alpha / r

def update_layer(self, adapter_name: str, r: int, alpha: float, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: Is this really needed as an abstract method? The abstract class does not require it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered above.

tests/test_custom_models.py Outdated Show resolved Hide resolved
Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, Alex, this looks stellar to me. Thanks for all the effort you put into this.

From my point of view, this is ready, I'll check with colleagues if we want to have a second review just to be sure.

I took a closer look at test coverage and found some small gaps, but they're not blockers for me. Up to you if you want to work on those:

  1. We can improve test coverage slightly by adding a LoKr config with rank dropout, e.g.:

("Vanilla MLP 7 LOKR", "MLP", LoKrConfig, {"target_modules": "lin0", "rank_dropout": 0.5}),

  1. There is currently no code path that leads to use_w1=False or to use_w2=False with conv2d. Honestly, not sure what needs to be done to get those.

@BenjaminBossan
Copy link
Member

Found a tiny are for improvement: In the docstring of check_target_module_exists in tuners_utils.py, the type for config is given as LoraConfig | LoHaConfig, it can be updated to LoraConfig | LycorisConfig.

@kovalexal
Copy link
Contributor Author

@BenjaminBossan, I updated the docstring of check_target_module_exists and also added several tests to cover rank_dropout functionality for LoHa & LoKr + path that leads to use_w1=False and use_w2=False.

@BenjaminBossan
Copy link
Member

I updated the docstring of check_target_module_exists and also added several tests to cover rank_dropout functionality for LoHa & LoKr + path that leads to use_w1=False and use_w2=False.

Great, thanks a lot.

@BenjaminBossan
Copy link
Member

@kovalexal I just noticed that LoHa and LoKr are missing delete_adapter, or am I missing it? If true, it can be added in a later PR, just wanted to confirm.

@kovalexal
Copy link
Contributor Author

@BenjaminBossan Yes, it slipped under my radar (added it in the recent commit), thank you for noticing!

@BenjaminBossan
Copy link
Member

Oh, noticed a small error in your last commit:

[key for key, _ in self.model.named_modules() if "lora" not in key]

@kovalexal
Copy link
Contributor Author

@BenjaminBossan sorry, fixed that one.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -43,10 +43,6 @@ def __init__(
self.out_features = out_features
self.is_feedforward = is_feedforward

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this has been removed?

def merge(self, *args) -> None:
raise NotImplementedError

def unmerge(self, *args) -> None:
raise NotImplementedError

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, because you moved it here

@BenjaminBossan BenjaminBossan merged commit 884b1ac into huggingface:main Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants