-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated LoKr implementation with lycoris support #2133
base: main
Are you sure you want to change the base?
Conversation
Thanks a lot for this PR. I haven't done an in-depth review yet, but from a first skim, this looks good already. A few points:
|
eceb9f7
to
12819ce
Compare
@BenjaminBossan I feel we should go with
I am confused on how to test it as the problem is lycoris version has too many flag for weights initialization, and I tried but couldn't find the combination for similar initialization of weight for both Since I haven't implemented all the features in |
Well, we can decide later on the naming, but I think
Okay, probably we won't manage to get 100% of the same results. It would probably still be worth it to have a script that compares the performance between the two and checks if they roughly match. It could be one of the existing examples in PEFT or the example you came up with. |
23fc583
to
4501e28
Compare
f7a4f5b
to
5d1977c
Compare
@BenjaminBossan Please do a first review of this PR. I have trained a simple MLP on MNIST dataset under same config and attached the loss curve plots below to compare different implementations. Few things from my side:
|
Let's try to figure this out first. Did you make some progress on investigating this? If you could share your script, I can also take a look. |
This is the notebook I am using for testing: https://colab.research.google.com/drive/12T9CZvSAPcVPi5usXtkbY5G1pvNgDjH7?usp=sharing EDIT: @BenjaminBossan The loss curves are looking much better now. Please have a look in the notebook. Query: In the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the progress on this PR. I added some comments but haven't run an in-depth review yet.
fixed rank dropout implementation ,full matrix tuning and fixed maths (not multiplying against the vector, but only the scalar) is WiP.
Could you extend a little bit, just so I'm clear what is changed where? Thanks.
Regarding replication of the results for the different implementations: I could make good progress based on your notebook. Some of the issues where simply caused by not always correctly setting the seed right before initializing the model.
After doing some fixes, I could get PEFT v1 and v2 to return the identical results. There is still a difference to LyCORIS but I think I've found the cause. In PEFT, at the start, we initialize w1 to zeros and w2_a and w2_b randomly. LyCORIS, however, initializes w2_b to zeros and w1 and w2_a randomly. Let's change PEFT v2 to use the same approach and check if this will result in the same outputs.
I've uploaded the notebook here.
src/peft/tuners/lokrv2/layer.py
Outdated
import torch | ||
import torch.nn as nn | ||
import torch.nn.functional as F | ||
from lycoris.functional import lokr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not assume that users have lycoris
installed. Since lokr
is not used very often, let's just import it locally where needed, but let's avoid importing it at the module level.
Also, something I wonder: Right now, only lokr.weight_gen
is being used. Can we also use lokr.make_kron
and lokr.make_kron
? Regarding bypass_forward_diff
, that could be difficult, but I'm not quite sure how it's used in LyCORIS so I'm not sure.
|
Thanks for explaining. So this is really in the PEFT code and not so much resolved because we're using Did you check my notebook about reproducibility? I think we should adjust the initialization of parameters to be in line with lycoris and then hopefully see the same results. |
Yes @BenjaminBossan , This could be done. But now I am a bit hesitant on a separate implementation; We mainly have three functional APIs Most of the optimizations/changes are done at module level like weight_decompose, rs_lora, unbalanced_factorization. Even though some of these are minor ones we can't utilize them using the current approach of funcitonal API. WDYT? In this case wouldn't it be better to rewrite the original LoKr and periodically(year or two) update the component. For us to effectively use
Yes, I have made the required changes and now we have similar initilalization to lycoris but not same random weights 😿 and I am having hard time reproducing same weights both locally and on colab even after setting seed extensicely. Colab link: https://colab.research.google.com/drive/1YxlaT9G_jotkoTrEMQIQwNX87lehcMHu?usp=sharing |
Thanks a lot @yaswanth19 for sharing your thoughts on this. When reviewing the PR, I was a bit astonished how little of the lycoris package we're actually using. As to the question of whether it's worth it to have this separate LoKr implementation or if we should rather fix the existing one: Depending on lycoris has the advantage that each time there is a fix or improvement there, we automatically benefit from it, which is nice. However, checking the The disadvantage in my eyes is that we create more confusion for users, have a higher complexity in our code base, and to avoid breaking backwards compatibility, we would need to maintain the existing LoKr implementation anyway. If your work on this PR has helped you identify the issues with the existing LoKr implementation, I would be super happy if you could work on fixing those. If you do that, there is even less necessity for the alternative implementation. But it could be that I'm missing some arguments, so if @bghira or @sayakpaul have different opinions, please let me know. In case we decide to go with fixing the existing implementation instead of adding the new one, I would propose that we elaborate the test notebook, e.g. to be a standalone script. Then we can use it to compare the results from the lycoris implementation to PEFT and ensure that they're close. This could be run daily or weekly on CI. Of course, we would need to extend the script to cover a few more cases, like conv layers and some of the edge cases discussed in the initial issue.
Don't worry too much about that. It would be exceedingly hard to get the exact same outcome using PEFT and lycoris, because we would need to ensure that each call involving randomness is executed exactly the same and in the same order. |
@bghira @sayakpaul A gentle ping for your thoughts on the above discussion. |
I am not opposed to the idea, but I guess it depends on the amount of work and the effectiveness of the implementation. From what I understand we're already having to touch many lines of code in the existing implementation, so might as well just fix them at this level instead of a separate implementation? |
For context refer this #1935
@BenjaminBossan Here is an early draft PR. Is this how you have envisioned when you meant by a separate implementation of Lokr.