Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lora+ implentation #1509

Closed
wants to merge 19 commits into from
Closed

Add lora+ implentation #1509

wants to merge 19 commits into from

Conversation

moghadas76
Copy link
Contributor

@BenjaminBossan
Copy link
Member

BenjaminBossan commented Feb 26, 2024

Duplicate of #1504 :)

Sorry about closing (wrong button).

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@moghadas76
Copy link
Contributor Author

What was the conclusion in that issue?

@BenjaminBossan
Copy link
Member

No conclusion yet, we want to wait and see if the performance gains are indeed robust. Regarding your code, it's basically just a giant string with the code, right? Was that the intent?

@moghadas76
Copy link
Contributor Author

waiting for you to ask implement new Trainer object or not?

@BenjaminBossan
Copy link
Member

Hey, after some discussion, I think we can proceed with this project. Let's add the create_loraplus_optimizer function but not the custom trainer class. We can put the function inside of peft/helpers.py.

Some considerations:

  • Add a reference to the original repo
  • Update the docs
  • Remove the logger code
  • If you feel up for the task, let's add some unit tests.

@BenjaminBossan
Copy link
Member

@moghadas76 do you still plan on working on this?

@moghadas76
Copy link
Contributor Author

moghadas76 commented Mar 12, 2024 via email

@BenjaminBossan
Copy link
Member

Great, thanks. On top of what I mentioned, let's also move this to a new file. I'm thinking src/peft/optimizers/loraplus.py. The idea here is that we want to add more optimizer-related methods in the future, so it makes sense to choose a proper file structure right away.

@moghadas76
Copy link
Contributor Author

Please review my code

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. It is a good start but there are a few issues, please check my comments. On top of that, could you please move the function out of helpers.py into a separate module, as I mentioned above?

Great, thanks. On top of what I mentioned, let's also move this to a new file. I'm thinking src/peft/optimizers/loraplus.py. The idea here is that we want to add more optimizer-related methods in the future, so it makes sense to choose a proper file structure right away.

Moreover, it would be great to document this function in our PEFT docs, but it would be fine to do that in a follow-up PR.

Finally, please run make style on your changes.

src/peft/tuners/lora/config.py Outdated Show resolved Hide resolved
src/peft/helpers.py Outdated Show resolved Hide resolved
src/peft/helpers.py Outdated Show resolved Hide resolved
src/peft/helpers.py Outdated Show resolved Hide resolved
src/peft/helpers.py Outdated Show resolved Hide resolved
tests/test_loraplus_helper.py Show resolved Hide resolved
src/peft/helpers.py Outdated Show resolved Hide resolved
src/peft/helpers.py Outdated Show resolved Hide resolved
@BenjaminBossan
Copy link
Member

@moghadas76 Do you still plan on working on this?

@moghadas76
Copy link
Contributor Author

moghadas76 commented Mar 25, 2024 via email

@BenjaminBossan
Copy link
Member

Yes, I'll fix the comments tonight

Thanks. No need to rush, I just wanted to inquire if you're still on it :)

@BenjaminBossan
Copy link
Member

@moghadas76 LMK once you're finished with your changes and want me to do another review.

@BenjaminBossan
Copy link
Member

Gentle ping @moghadas76

@moghadas76
Copy link
Contributor Author

Hi
I fixed the comments
Please review again

@moghadas76
Copy link
Contributor Author

@BenjaminBossan

@BenjaminBossan
Copy link
Member

Sorry for the delay, I was at a conference, will review soon.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the adjustments, this already looks quite good. I still found a few minor areas for improvements, which I commented. Also, as mentioned in my earlier comment, could you please move the code to a different file?

src/peft/utils/peft_types.py Outdated Show resolved Hide resolved
src/peft/helpers.py Outdated Show resolved Hide resolved
src/peft/helpers.py Outdated Show resolved Hide resolved
tests/test_loraplus_helper.py Show resolved Hide resolved
@BenjaminBossan
Copy link
Member

Hmm, code quality checks are still failing with:

tests/test_loraplus_helper.py:1:1: I001 [*] Import block is un-sorted or un-formatted

Is it possible that your local ruff version differs? CI uses v0.2.2.

@moghadas76
Copy link
Contributor Author

You were right. My ruff version was old.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. Our code style check still fails though, not sure what the reason is if you use the same ruff version. Here is the diff that I get when running ruff locally on your branch:

modified   src/peft/optimizers/__init__.py
@@ -17,4 +17,4 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-from .loraplus import create_loraplus_optimizer
\ No newline at end of file
+from .loraplus import create_loraplus_optimizer
modified   src/peft/optimizers/loraplus.py
@@ -8,20 +8,24 @@ from transformers.trainer_pt_utils import get_parameter_names
 from ..peft_model import PeftModel
 
 
-def create_loraplus_optimizer(model: PeftModel, optimizer_cls: type[Optimizer], optimizer_kwargs: dict, loraplus_lr_embedding: float=1e-6) -> Optimizer:
+def create_loraplus_optimizer(
+    model: PeftModel, optimizer_cls: type[Optimizer], optimizer_kwargs: dict, loraplus_lr_embedding: float = 1e-6
+) -> Optimizer:
     """
-    Creates a LoraPlus optimizer.
-    Implementing LoRA+ https://arxiv.org/abs/2402.12354
-    Reference: https://github.com/nikhil-ghosh-berkeley/loraplus/
+    Creates a LoraPlus optimizer. Implementing LoRA+ https://arxiv.org/abs/2402.12354 Reference:
+    https://github.com/nikhil-ghosh-berkeley/loraplus/
 
     Args:
         model (`torch.nn.Module`): The model to be optimized.
         optimizer_cls (`torch.optim.Optimizer`): The optimizer class to be used.
         optimizer_kwargs (`dict`): Additional keyword arguments to be passed to the optimizer.
-            - **loraplus_lr_ratio** (`float`): The ratio of the learning rate to be used for the embedding layer. Defaults to loraplus_lr_ratio
-            - loraplus_lr_embedding (`float`): The learning rate to be used for the embedding layer. Defaults to loraplus_lr_embedding
+            - **loraplus_lr_ratio** (`float`): The ratio of the learning rate to be used for the embedding layer.
+              Defaults to loraplus_lr_ratio
+            - loraplus_lr_embedding (`float`): The learning rate to be used for the embedding layer. Defaults to
+              loraplus_lr_embedding
     """
     from ..tuners.lora.layer import Embedding
+
     loraplus_lr_ratio = optimizer_kwargs.pop("loraplus_lr_ratio")
 
     decay_parameters = get_parameter_names(model, ALL_LAYERNORM_LAYERS)
@@ -81,6 +85,7 @@ def create_loraplus_optimizer(model: PeftModel, optimizer_cls: type[Optimizer],
     optimizer = optimizer_cls(optimizer_grouped_parameters, **optimizer_kwargs)
     if optimizer_cls.__name__ == "Adam8bit":
         import bitsandbytes
+
         manager = bitsandbytes.optim.GlobalOptimManager.get_instance()
         for module in model.modules():
             if isinstance(module, nn.Embedding):
modified   tests/test_loraplus_helper.py
@@ -25,32 +25,37 @@ def test_lora_plus_helper_sucess():
     model = SimpleNet()
     optimizer_cls = bnb.optim.Adam8bit
     optim_config = {
-        'lr': 5e-5,
-        'eps': 1e-6,
-        'betas': (0.9, 0.999),
-        'weight_decay': 0.0,
+        "lr": 5e-5,
+        "eps": 1e-6,
+        "betas": (0.9, 0.999),
+        "weight_decay": 0.0,
         "loraplus_lr_ratio": 0.2,
     }
-    optim = create_loraplus_optimizer(model=model, optimizer_cls=optimizer_cls, optimizer_kwargs=optim_config, loraplus_lr_embedding=1e-6)
+    optim = create_loraplus_optimizer(
+        model=model, optimizer_cls=optimizer_cls, optimizer_kwargs=optim_config, loraplus_lr_embedding=1e-6
+    )
     assert optim is not None
     assert len(optim.param_groups) == 4
 
+
 def test_lora_plus_optimizer_sucess():
     optimizer_cls = bnb.optim.Adam8bit
     optim_config = {
-        'lr': 5e-5,
-        'eps': 1e-6,
-        'betas': (0.9, 0.999),
-        'weight_decay': 0.0,
+        "lr": 5e-5,
+        "eps": 1e-6,
+        "betas": (0.9, 0.999),
+        "weight_decay": 0.0,
         "loraplus_lr_ratio": 0.2,
     }
     model: SimpleNet = SimpleNet().cuda()
-    optim = create_loraplus_optimizer(model=model, optimizer_cls=optimizer_cls, optimizer_kwargs=optim_config, loraplus_lr_embedding=1e-6)
+    optim = create_loraplus_optimizer(
+        model=model, optimizer_cls=optimizer_cls, optimizer_kwargs=optim_config, loraplus_lr_embedding=1e-6
+    )
     loss = torch.nn.CrossEntropyLoss()
     bnb.optim.GlobalOptimManager.get_instance().register_parameters(model.parameters())
     x = torch.randint(100, (2, 4, 10)).cuda()
     output = model(x).permute(0, 3, 1, 2)
-    label = torch.randint(16, (2,4,10,)).cuda()
+    label = torch.randint(16, (2, 4, 10)).cuda()
     loss_value = loss(output, label)
     loss_value.backward()
     optim.step()

tests/test_loraplus_helper.py Show resolved Hide resolved
tests/test_loraplus_helper.py Show resolved Hide resolved
@moghadas76
Copy link
Contributor Author

Could you determine what is the problem of
doc-builder style src/peft tests docs/source --max_len 119 --check_only

Traceback (most recent call last):
File "/home/moghadas/miniconda3/bin/doc-builder", line 8, in
sys.exit(main())
^^^^^^
File "/home/moghadas/miniconda3/lib/python3.11/site-packages/doc_builder/commands/doc_builder_cli.py", line 47, in main
args.func(args)
File "/home/moghadas/miniconda3/lib/python3.11/site-packages/doc_builder/commands/style.py", line 28, in style_command
raise ValueError(f"{len(changed)} files should be restyled!")
ValueError: 2 files should be restyled!

@BenjaminBossan
Copy link
Member

Could you determine what is the problem of
doc-builder style src/peft tests docs/source --max_len 119 --check_only

Where did you see that? The code quality check only spits out this:

tests/test_loraplus_helper.py:52:56: W291 Trailing whitespace

Also, these lines could probably be reduced to a single line:

https://github.com/huggingface/peft/pull/1509/files#diff-4d3762210943c647e8ba391b5261730402551c7fa8f3903686fdabde648cea89R71-R78

I guess what happened here is that your editor added those line breaks because it is configured with a lower line limit than what we have in PEFT.

@shubhamjain0594
Copy link

@moghadas76 @BenjaminBossan I can take this up and make necessary changes if you are short on time. We can aim to get this PR merged this week, let me know if its okay.

@BenjaminBossan
Copy link
Member

I can take this up and make necessary changes if you are short on time. We can aim to get this PR merged this week, let me know if its okay.

That's fine from my point of view. A separate PR with credits given would also work for me.

For my understanding: Who is "we" in this case, are you collaborating with moghadas76?

@shubhamjain0594
Copy link

shubhamjain0594 commented Jun 24, 2024

That's fine from my point of view. A separate PR with credits given would also work for me.

For my understanding: Who is "we" in this case, are you collaborating with moghadas76?

I am doing it by myself. By "we" I just meant you and me, and @moghadas76 if they are available.

Also can you advice on how to provide the credit?

@BenjaminBossan
Copy link
Member

I see. Sure, please go ahead. As you probably can't push on top of this PR, feel free to create a new one. If we don't hear back from moghadas76 by the time the new PR is ready to be merged, we can add them as a co-author.

@stillmatic
Copy link
Contributor

I am happy to clean this up too.

IMO the API is not the most clear as currently presented. IMO the embedding LR and the ratio should be either both optimizer_kwargs or both named args. It makes more sense to me that the optimizer kwargs are purely passed to the optimizer rather than to the LR adjustments.

Finally, should the 8 bit -> 32 bit upcast be applied to all the 8 bit optimizers?

    eight_bit_names = ["Adam8bit", "AdamW8bit", "PagedAdam8bit", "PagedAdamW8bit"]
    if optimizer_cls.__name__ in eight_bit_names:

@BenjaminBossan
Copy link
Member

@stillmatic Thanks, that would also be fine, just pinging @shubhamjain0594 to ensure that there won't be any duplicate work.

@moghadas76
Copy link
Contributor Author

This is very disrespectful. He stole this branch.

@shubhamjain0594
Copy link

shubhamjain0594 commented Jul 1, 2024

This is very disrespectful. He stole this branch.

@BenjaminBossan if @stillmatic has time then sure please go for it. I was going to raise a PR today, but was mainly looking to fix some documentation and other small bugs you had raised.

@moghadas76 not stealing anyone's work here. Just want to get this PR merged so that I can start using it in my repo without doing weird installation. I have not yet raised a PR, and can wait if you have time to get this done.

@stillmatic
Copy link
Contributor

Happy to make my suggestions as comments on this branch if you have the time to address them here. I appreciate the work - I used the implementation here in my training, but ran into some problems, hence seeing what needs improvement.

@kallewoof
Copy link
Contributor

kallewoof commented Jul 2, 2024

This is very disrespectful. He stole this branch.

Assuming proper credit is given, there's nothing disrespectful about picking up someone's work if they are unable to complete it in a timely manner; there are month+ gaps between you receiving review and you actually addressing it. The PR is 6 4 months old, for no particular reason at all. Let's get this polished up and merged.

@kallewoof
Copy link
Contributor

I have rebased and done some fixes on top of this pull request here: https://github.com/kallewoof/peft/tree/202407-loraplus

@moghadas76 You can either base your work off of my fixes or redo it yourself. Whatever gets this merged the fastest. I specifically did not make a pull request out of this, as it sounds like you really want to do this yourself.

@stillmatic Did I get your suggestions in there correctly?

@BenjaminBossan
Copy link
Member

I agree that this is not about "stealing" work. All of this is a big collaboration, after all the initial PR was heavily based on https://github.com/nikhil-ghosh-berkeley/loraplus/blob/main/lora_plus.py.

It would be best if we can get this PR over the finish line, as we're not missing a lot. @moghadas76 if you are still interested, let's try to finish this in the next two weeks. There has been some good feedback in this thread, so I'm sure we have everything we need to get this ready.

If there is no progress here, I'm happy to merge other PRs that implement the same idea, with proper references being given. I'll ensure that co-authorship is respected when merging.

]

optimizer = optimizer_cls(optimizer_grouped_parameters, **optimizer_kwargs)
if optimizer_cls.__name__ == "Adam8bit":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this support the other 8-bit adam implementations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kallewoof this is the only important comment I had, simplest would be

    eight_bit_names = ["Adam8bit", "AdamW8bit", "PagedAdam8bit", "PagedAdamW8bit"]
    if optimizer_cls.__name__ in eight_bit_names:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added to proposed branch.

"""
from ..tuners.lora.layer import Embedding

loraplus_lr_ratio = optimizer_kwargs.pop("loraplus_lr_ratio")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's confusing to me that loraplus_lr_ratio is an optimizer_kwarg while loraplus_lr_embedding is a function argument. IMO both should be function arguments, while optimizer_kwarg should reflect the arguments passed to the optimizer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made both of them args, outside of optimizer_kwarg in https://github.com/kallewoof/peft/tree/202407-loraplus FWIW, based on your comment.

Args:
model (`torch.nn.Module`): The model to be optimized.
optimizer_cls (`torch.optim.Optimizer`): The optimizer class to be used.
optimizer_kwargs (`dict`): Additional keyword arguments to be passed to the optimizer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should note explicitly that lr and weight_decay are expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the proposed changes in https://github.com/kallewoof/peft/tree/202407-loraplus

@kallewoof
Copy link
Contributor

It's been a week, so I opened the above branch as a pull req. Close it if that's not OK, @BenjaminBossan.

@BenjaminBossan
Copy link
Member

Thanks @kallewoof. Starting tomorrow until the end of the week, I'll be at EuroPython Prague, so I will have little time for reviews etc. If by then, there is no progress on this PR, we can close it and continue with yours. As mentioned earlier, I'll make sure to assign proper credit before merging.

@BenjaminBossan
Copy link
Member

Supersedes by #1915.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants