Updated LoKr implementation with lycoris support #2133

yaswanth19 · 2024-10-06T20:11:32Z

For context refer this #1935

@BenjaminBossan Here is an early draft PR. Is this how you have envisioned when you meant by a separate implementation of Lokr.

BenjaminBossan · 2024-10-07T10:20:28Z

Thanks a lot for this PR. I haven't done an in-depth review yet, but from a first skim, this looks good already. A few points:

Let's ensure that we don't add any new dependencies. The dependency should be optional but it can be added as a dev dependency.
About the name, I wonder if we should go with LycorisLoKr rather than LoKrv2. It's more typing, true, but "v2" is just not very informative. If we stick with "v2", I'd go with LoKrV2Config rather than LoKrConfigv2, etc. I'm also open to other suggestions. WDYT?
Did you already have time to test how the two implementations compare to one another?
Please fix the merge conflict.

yaswanth19 · 2024-10-12T19:42:50Z

About the name, I wonder if we should go with LycorisLoKr rather than LoKrv2. It's more typing, true, but "v2" is just not very informative. If we stick with "v2", I'd go with LoKrV2Config rather than LoKrConfigv2, etc. I'm also open to other suggestions. WDYT?

@BenjaminBossan I feel we should go with LoKrV2 as this is the latest version and we providing support to use lycoris. If we rename it LycorisLoKr it kind of means we are implementing a API for lycoris even though it is a major functionality.

Did you already have time to test how the two implementations compare to one another?

I am confused on how to test it as the problem is lycoris version has too many flag for weights initialization, and I tried but couldn't find the combination for similar initialization of weight for both PEFT LoKr and lycoris Lokr. So tried training it on MNIST with a simple MLP but then the weights are differing as we are getting difference peformance .

Since I haven't implemented all the features in Lokrv2 so right now I haven't performed the sanity check of comparing results between upstream and LoKrV2.

BenjaminBossan · 2024-10-14T13:23:12Z

I feel we should go with LoKrV2 as this is the latest version and we providing support to use lycoris. If we rename it LycorisLoKr it kind of means we are implementing a API for lycoris even though it is a major functionality.

Well, we can decide later on the naming, but I think LycorisLokr would still be a more descriptive name.

I am confused on how to test it as the problem is lycoris version has too many flag for weights initialization, and I tried but couldn't find the combination for similar initialization of weight for both PEFT LoKr and lycoris Lokr. So tried training it on MNIST with a simple MLP but then the weights are differing as we are getting difference peformance .

Okay, probably we won't manage to get 100% of the same results. It would probably still be worth it to have a script that compares the performance between the two and checks if they roughly match. It could be one of the existing examples in PEFT or the example you came up with.

yaswanth19 · 2024-10-20T13:38:23Z

@BenjaminBossan Please do a first review of this PR. I have trained a simple MLP on MNIST dataset under same config and attached the loss curve plots below to compare different implementations.

Few things from my side:

I haven't implemented the weight decompose and can be done in a separate PR if majority of community want it.
Things done in this PR from the list (Issue 1935) - fixed rank dropout implementation ,full matrix tuning and fixed maths (not multiplying against the vector, but only the scalar) is WiP.
Training loss curve is a bit concerning and I need to investigate it.
Added LycorisLokr to existing testcases and will be writing some new test cases too.

BenjaminBossan · 2024-10-21T10:11:37Z

I have trained a simple MLP on MNIST dataset under same config

Training loss curve is a bit concerning and I need to investigate it.

Let's try to figure this out first. Did you make some progress on investigating this? If you could share your script, I can also take a look.

yaswanth19 · 2024-10-21T12:11:08Z

Let's try to figure this out first. Did you make some progress on investigating this? If you could share your script, I can also take a look.

This is the notebook I am using for testing: https://colab.research.google.com/drive/12T9CZvSAPcVPi5usXtkbY5G1pvNgDjH7?usp=sharing

EDIT: @BenjaminBossan The loss curves are looking much better now. Please have a look in the notebook.

Query: In the PEFT LoKr implementation we have init_weights which is used to initialize weights of lokr w1 matrix with zeros. In LYCORIS implementation we are initializing all w1 matrices randomly. The reason me asking this is that when I make the code changes for use_scalar parameter, I am getting the delta as 0 because according to PEFT implementation in this case both w1 and w2 would be 0 matrices.

BenjaminBossan

Thanks for the progress on this PR. I added some comments but haven't run an in-depth review yet.

fixed rank dropout implementation ,full matrix tuning and fixed maths (not multiplying against the vector, but only the scalar) is WiP.

Could you extend a little bit, just so I'm clear what is changed where? Thanks.

Regarding replication of the results for the different implementations: I could make good progress based on your notebook. Some of the issues where simply caused by not always correctly setting the seed right before initializing the model.

After doing some fixes, I could get PEFT v1 and v2 to return the identical results. There is still a difference to LyCORIS but I think I've found the cause. In PEFT, at the start, we initialize w1 to zeros and w2_a and w2_b randomly. LyCORIS, however, initializes w2_b to zeros and w1 and w2_a randomly. Let's change PEFT v2 to use the same approach and check if this will result in the same outputs.

I've uploaded the notebook here.

BenjaminBossan · 2024-10-22T14:03:28Z

src/peft/tuners/lokrv2/layer.py

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from lycoris.functional import lokr


We should not assume that users have lycoris installed. Since lokr is not used very often, let's just import it locally where needed, but let's avoid importing it at the module level.

Also, something I wonder: Right now, only lokr.weight_gen is being used. Can we also use lokr.make_kron and lokr.make_kron? Regarding bypass_forward_diff, that could be difficult, but I'm not quite sure how it's used in LyCORIS so I'm not sure.

src/peft/tuners/lokrv2/__init__.py

yaswanth19 · 2024-10-22T15:25:50Z

Could you extend a little bit, just so I'm clear what is changed where?

In LoKr we were directly dividing the dropout and now I have added the rank_dropout_scale flag
https://github.com/yaswanth19/peft/blob/25f00554bfb259a9e9afbc0c1d8d5ff14203070e/src/peft/tuners/lokrv2/layer.py#L435
Using full matrix now to determine whether to use w1 and w2 https://github.com/yaswanth19/peft/blob/25f00554bfb259a9e9afbc0c1d8d5ff14203070e/src/peft/tuners/lokrv2/layer.py#L137
Mostly scale related corrections; In LoKr we are not calculating scale correctly and not using it in make_kron.

BenjaminBossan · 2024-10-23T14:54:27Z

In LoKr we were directly dividing the dropout and now I have added the rank_dropout_scale flag
https://github.com/yaswanth19/peft/blob/25f00554bfb259a9e9afbc0c1d8d5ff14203070e/src/peft/tuners/lokrv2/layer.py#L435

Using full matrix now to determine whether to use w1 and w2 https://github.com/yaswanth19/peft/blob/25f00554bfb259a9e9afbc0c1d8d5ff14203070e/src/peft/tuners/lokrv2/layer.py#L137

Mostly scale related corrections; In LoKr we are not calculating scale correctly and not using it in make_kron.

Thanks for explaining. So this is really in the PEFT code and not so much resolved because we're using lycoris, right? So we could make the same fixes to the existing LoKr implementation too (of course in a separate PR).

Did you check my notebook about reproducibility? I think we should adjust the initialization of parameters to be in line with lycoris and then hopefully see the same results.

yaswanth19 · 2024-10-23T16:08:11Z

So this is really in the PEFT code and not so much resolved because we're using lycoris, right? So we could make the same fixes to the existing LoKr implementation too (of course in a separate PR).

Yes @BenjaminBossan , This could be done. But now I am a bit hesitant on a separate implementation; We mainly have three functional APIs weight_gen which would be used to initialize weights which can be optimized in lycoris in future , make_kron and diff_weight which will not be changed mostly as they are doing fundamental things. I couldn't understand the bypass_forwad_diff method accurately but I think it is mostly calculating w*delta.

Most of the optimizations/changes are done at module level like weight_decompose, rs_lora, unbalanced_factorization. Even though some of these are minor ones we can't utilize them using the current approach of funcitonal API. WDYT?

In this case wouldn't it be better to rewrite the original LoKr and periodically(year or two) update the component. For us to effectively use use_upstream feature; author needs to have full feature parity between Functional and Module which is not the case currently.

Did you check my notebook about reproducibility? I think we should adjust the initialization of parameters to be in line with lycoris and then hopefully see the same results.

Yes, I have made the required changes and now we have similar initilalization to lycoris but not same random weights 😿 and I am having hard time reproducing same weights both locally and on colab even after setting seed extensicely.

Colab link: https://colab.research.google.com/drive/1YxlaT9G_jotkoTrEMQIQwNX87lehcMHu?usp=sharing

BenjaminBossan · 2024-10-24T09:55:31Z

Thanks a lot @yaswanth19 for sharing your thoughts on this. When reviewing the PR, I was a bit astonished how little of the lycoris package we're actually using.

As to the question of whether it's worth it to have this separate LoKr implementation or if we should rather fix the existing one: Depending on lycoris has the advantage that each time there is a fix or improvement there, we automatically benefit from it, which is nice. However, checking the lycoris/functional/lokr.py file, there was no change in the past 4 months (besides the docs), so I'm not sure how it plays out in practice. And as mentioned, we're not using a lot of it in the first place.

The disadvantage in my eyes is that we create more confusion for users, have a higher complexity in our code base, and to avoid breaking backwards compatibility, we would need to maintain the existing LoKr implementation anyway.

If your work on this PR has helped you identify the issues with the existing LoKr implementation, I would be super happy if you could work on fixing those. If you do that, there is even less necessity for the alternative implementation. But it could be that I'm missing some arguments, so if @bghira or @sayakpaul have different opinions, please let me know.

In case we decide to go with fixing the existing implementation instead of adding the new one, I would propose that we elaborate the test notebook, e.g. to be a standalone script. Then we can use it to compare the results from the lycoris implementation to PEFT and ensure that they're close. This could be run daily or weekly on CI. Of course, we would need to extend the script to cover a few more cases, like conv layers and some of the edge cases discussed in the initial issue.

Yes, I have made the required changes and now we have similar initilalization to lycoris but not same random weights 😿 and I am having hard time reproducing same weights both locally and on colab even after setting seed extensicely.

Don't worry too much about that. It would be exceedingly hard to get the exact same outcome using PEFT and lycoris, because we would need to ensure that each call involving randomness is executed exactly the same and in the same order.

yaswanth19 · 2024-10-26T14:00:33Z

@bghira @sayakpaul A gentle ping for your thoughts on the above discussion.

sayakpaul · 2024-10-26T23:38:30Z

If your work on this PR has helped you identify the issues with the existing LoKr implementation, I would be super happy if you could work on fixing those. If you do that, there is even less necessity for the alternative implementation.

I am not opposed to the idea, but I guess it depends on the amount of work and the effectiveness of the implementation. From what I understand we're already having to touch many lines of code in the existing implementation, so might as well just fix them at this level instead of a separate implementation?

yaswanth19 changed the title ~~LOKR latest implementation with lycoris support~~ Updated LoKr implementation with lycoris support Oct 6, 2024

BenjaminBossan mentioned this pull request Oct 7, 2024

Support of the LyCORIS (LoCon/LoHA) models huggingface/diffusers#3087

Closed

2 tasks

KohakuBlueleaf mentioned this pull request Oct 9, 2024

[Call for contributions] help us improve LoKr, LoHa, and other LyCORIS #1935

Open

6 tasks

yaswanth19 force-pushed the add-lycoris-lokr-support branch from eceb9f7 to 12819ce Compare October 12, 2024 06:05

yaswanth19 force-pushed the add-lycoris-lokr-support branch from 23fc583 to 4501e28 Compare October 20, 2024 04:52

yaswanth19 added 12 commits October 20, 2024 10:24

Added new implementation of LoKr

99a2d7c

Lokr refactored files

b05fbac

Added additional kwargs

39d9d7f

🐛minor refactoring and bug fixes

dd18045

Added LoKrv2 testcases

dfbc625

Added LoKrv2 utils

9ddec02

Reformatting changes

11d6b6c

Added lycoris package

94e2b15

Refactoring and Added lycoris functional API

cac801c

Updated variable name

223c968

lycoris as optional dependency

e01f6fd

Added scaling and full matrix

5d1977c

yaswanth19 force-pushed the add-lycoris-lokr-support branch from f7a4f5b to 5d1977c Compare October 20, 2024 05:02

📝 Minor naming fixes

25f0055

BenjaminBossan requested changes Oct 22, 2024

View reviewed changes

added use_scalar parameter

f617c01

yaswanth19 mentioned this pull request Oct 28, 2024

BUG🐛: Fixed scale related bugs in LoKr | Added rank_dropout_scale parameter #2180

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated LoKr implementation with lycoris support #2133

Updated LoKr implementation with lycoris support #2133

yaswanth19 commented Oct 6, 2024 •

edited

Loading

BenjaminBossan commented Oct 7, 2024 •

edited

Loading

yaswanth19 commented Oct 12, 2024

BenjaminBossan commented Oct 14, 2024

yaswanth19 commented Oct 20, 2024

BenjaminBossan commented Oct 21, 2024

yaswanth19 commented Oct 21, 2024 •

edited

Loading

BenjaminBossan left a comment •

edited

Loading

BenjaminBossan Oct 22, 2024

yaswanth19 commented Oct 22, 2024

BenjaminBossan commented Oct 23, 2024

yaswanth19 commented Oct 23, 2024 •

edited

Loading

BenjaminBossan commented Oct 24, 2024

yaswanth19 commented Oct 26, 2024

sayakpaul commented Oct 26, 2024

Updated LoKr implementation with lycoris support #2133

Are you sure you want to change the base?

Updated LoKr implementation with lycoris support #2133

Conversation

yaswanth19 commented Oct 6, 2024 • edited Loading

BenjaminBossan commented Oct 7, 2024 • edited Loading

yaswanth19 commented Oct 12, 2024

BenjaminBossan commented Oct 14, 2024

yaswanth19 commented Oct 20, 2024

BenjaminBossan commented Oct 21, 2024

yaswanth19 commented Oct 21, 2024 • edited Loading

BenjaminBossan left a comment • edited Loading

Choose a reason for hiding this comment

BenjaminBossan Oct 22, 2024

Choose a reason for hiding this comment

yaswanth19 commented Oct 22, 2024

BenjaminBossan commented Oct 23, 2024

yaswanth19 commented Oct 23, 2024 • edited Loading

BenjaminBossan commented Oct 24, 2024

yaswanth19 commented Oct 26, 2024

sayakpaul commented Oct 26, 2024

yaswanth19 commented Oct 6, 2024 •

edited

Loading

BenjaminBossan commented Oct 7, 2024 •

edited

Loading

yaswanth19 commented Oct 21, 2024 •

edited

Loading

BenjaminBossan left a comment •

edited

Loading

yaswanth19 commented Oct 23, 2024 •

edited

Loading