Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable ability to resize lora dim based off sv ratios #243

Merged
merged 5 commits into from
Mar 10, 2023

Conversation

mgz-dev
Copy link
Contributor

@mgz-dev mgz-dev commented Feb 28, 2023

Had the idea after looking at KohakuBlueleaf's implementation using SV threshold to dynamically determine layer dimension while extracting dreambooth models to lora.

This uses maximum SV ratio instead of a threshold (all singular value ranks below the ratio are dropped. e.g. if ratio is 10 and the largest singular value is 5, all singular values below 0.5 are removed and the dim is dynamically calculated per layer).

LoRA resized this way currently do not work with sd-webui-additional-networks since it will give size mismatch warnings but does function with built-in lora support in webui.

Same logic can be applied to extract_lora_from_models.

@TingTingin
Copy link
Contributor

any comparisons?

@mgz-dev
Copy link
Contributor Author

mgz-dev commented Feb 28, 2023

Sure. I grabbed the Arcane, Amber, and Makima LoRA from CivitAI since they are ranked as popular and processed them at sv_ratio of 5 (you can likely be more aggressive if you want). Listing file size reduction since dim is dynamic in new LoRAs. Prompt mostly just copied from a preview image.

  • Arcane (295mb to 55mb)
    1851130503 cfg7 steps20 1950 (Custom)

  • Amber (144mb to 23mb)
    1851130503 cfg7 steps20 1619 (Custom)

  • Makima (148mb to 49mb)
    1851130503 cfg7 steps20 1336 (Custom)

@kohya-ss
Copy link
Owner

Thank you for this! It looks good!

The extension needs to be updated first. The modification is not difficult so much, and I will make the change as soon as I have time. I'd like to support variable alpha and Conv2d at the same time.

@thojmr
Copy link

thojmr commented Mar 1, 2023

Was just about to mention Conv2d, nice! Looking forward to testing dreambooth extraction after Conv2d is implemented. Should be way more accurate.

@ashen-sensored
Copy link

ashen-sensored commented Mar 1, 2023

K percentile(cumsum) instead of using SV threshold, then frobenius norm as precision error check may be a better way of doing it.
We can either directly use k as controllable input, or precision error as controllable input and do a sweep.

Locon author's comment about this, just FYI:
image

Had the idea after looking at KohakuBlueleaf's implementation using SV threshold to dynamically determine layer dimension while extracting dreambooth models to lora.

This uses maximum SV ratio instead of a threshold (all singular value ranks below the ratio are dropped. e.g. if ratio is 10 and the largest singular value is 5, all singular values below 0.5 are removed and the dim is dynamically calculated per layer).

LoRA resized this way currently do not work with sd-webui-additional-networks since it will give size mismatch warnings but does function with built-in lora support in webui.

Same logic can be applied to extract_lora_from_models.

@mgz-dev
Copy link
Contributor Author

mgz-dev commented Mar 1, 2023

Thanks for the suggestion. It is easy to implement resizing based off any of these methods (cumulative sum, Frobenius norm, or sv ratio) so I went ahead and added them as options. I think they all have their strengths and weaknesses though and don't necessarily agree that cumulative sum or frobenius norm is strictly better.

  • Singular value ratio should function well if the goal is to denoise lower ranking data. It likely will also perform best at extremely high levels of compression (though this should be checked on a case-by-case basis)
  • Cumulative sum is similar to SV ratio, though the meaning of this value does not translate as well between different starting dims. It is important for people to understand that a "lower" cumulative sum is not necessarily bad. I originally avoided this because I was afraid people would misunderstand 0.6 as "only 60% as good as the original".
  • Frobenius norm is likely the most intuitive number for most users. This will try to represent the original matrix as accurately as possible. This is a good choice for trying to recreate with the highest level of accuracy at the tradeoff of compression. This is not a good option for users is trying to salvage an overfit LoRA. It will have lower compression on average but the least variance per layer.

@TingTingin
Copy link
Contributor

TingTingin commented Mar 1, 2023

What value ranges are each of those operating in? especially Cumulative sum since as you say the values are unintuitive.
Also is the any explanation on how resizing helps with overfitting

@bmaltais
Copy link
Contributor

bmaltais commented Mar 3, 2023

I discovered an interesting thing... clamping is sort of degrading the output. To my surprise changing the CLAMP_QUANTILE = 1 # 0.99 to 1 improves the results drastically... here is an example:

Original rank 256 model:

grid-0205

Resized with sv 5 clamp 0.99:

grid-0207

Resized with sv 5 clamp 1:

grid-0206

File size reduction for both was the same... from 295MB to 19.4 MB... but output of clamp 1 is much closer to original... perhaps the clamp value need to be exposed as a parameter?

@mgz-dev
Copy link
Contributor Author

mgz-dev commented Mar 3, 2023

@bmaltais good observation. I actually think clamping may not be required at all since SVD should always return norm 1 vectors within U and Vh but wasn't sure if there were some unique exceptions in the algorithm used in torch which caused problems to warrant its inclusion in the original dreambooth extraction code. It may be best to just remove the quantile/clamp processing entirely.

@TingTingin as a brief explanation, it is somewhat similar to the theory behind PCA.
pca example
In our case the assumption is that dimensions with lower variance (you can think of smaller singular values as a proxy for less information) are less likely to contain useful data and more likely to contain noise from the training set which does not reflect what was actually desired to be learned during the training process. You can also think of it similarly to signal denoising.

As far as valid ranges,

  • sv_ratio should be greater than 1 (I'd recommend staying 2 or greater at a minimum)
  • sv_cumulative is somewhat dependent on the original rank of a matrix but numbers are restricted to be between (0,1)
  • sv_fro is also restricted to be between (0,1) but in practice should be much higher than sv_cumulative

You can use --verbose flag and compare numbers. I'd recommend doing some level of iteration to see what gives the best results for whichever LoRA you try to reduce since there is some variation in a case by case basis.

@bmaltais
Copy link
Contributor

bmaltais commented Mar 3, 2023

Based on testing one is better to create a LoRA in two steps. Use the highest rank possible to get the most out when creating a LoRA then resize it to a much smaller size with minimal loss. This provide way better results vs trying to train a small rank LoRA in one step

@TingTingin
Copy link
Contributor

Thx for the explanation

- code refactor to be able to re-use same function for dynamic extract lora
- remove clamp
- fix issue where if sv_ratio is too high index goes out of bounds
Modified resize script so support different types of LoRA networks (refer to Kohaku-Blueleaf module implementation for structure).
-new_rank arg changed to limit the max rank of any layer.
-added logic to make sure zero-ed layers do not create large lora dim
@mgz-dev
Copy link
Contributor Author

mgz-dev commented Mar 4, 2023

Apologies for so many additional changes. I was working to make the script modular to be able to recycle code to handle dreambooth extraction but it looks like Kohaku-Blueleaf pushed an update their repo which overlaps in that task. I'll table dreambooth extraction for now unless there is some reason to bring it back up.

Latest version should also be able to handle the conv2d layers.

important --new_rank will now also limit the max rank of any individual lora layer during dynamic resizing. I added this because I noticed some layers could be disproportionately expensive space-wise. This will allow people to see if restricting those layers further result in quality loss (users can test on a model-by-model basis if they're trying to maximize space savings).

@bmaltais I have not done enough testing to say if 2-step LoRA is the best method, but I also have seen examples of trimming LoRA even improving quality. As a caveat though, I would think that trimming too many ranks runs the risk of losing important information in the model. There is probably some range that is best to stay within for resizing.

@bmaltais
Copy link
Contributor

bmaltais commented Mar 4, 2023

@mgz-dev I will refer to your repo for updated code until you are ready to update the PR as to keep up with the improvements. Nice work. I like that it now support LoCon also.

I have a question for you... if one use SD1.5 as the model... what would you say it the highest rank one should use when creatien a LoRA... since the model use 768 vectors... would a ran 768 be the limit after whick there is no return?

I am asking because I think LoRA can be used as nice "frozen" model training. Once the LoRA has been created at full model potential resolution one simply use your tool to reduce it down to what really constitute the model. In my test this is providing the best LoRA.... but I wonder what the practical rank should be for a model...

@kohya-ss
Copy link
Owner

kohya-ss commented Mar 6, 2023

Thank you all! I am not familiar with the mathematical theory of LoRA and svd, so this is very helpful.

The support for dynamic rank in sd-webui-additional-network has been completed in the dev branch.

I will test and merge this today after work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants