Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize DoRA in eval and no dropout #2122

Merged
merged 17 commits into from
Oct 16, 2024
5 changes: 5 additions & 0 deletions src/peft/tuners/lora/layer.py
Original file line number Diff line number Diff line change
Expand Up @@ -585,6 +585,11 @@ def forward(self, x: torch.Tensor, *args: Any, **kwargs: Any) -> torch.Tensor:
if not self.use_dora[active_adapter]:
result = result + lora_B(lora_A(dropout(x))) * scaling
else:
if isinstance(dropout, nn.Identity):
print("no dropout, optimize here")
else:
print("dropout, same ops")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenjaminBossan did you envision something like this?

My intuition was:

  1. Figure out whether there is dropout or not
  2. Use a flag for dropout
  3. Pass the flag to the forward or DoRA layers -- where I would need to skip the alignment step and reuse x (the base model outputs)

Let me know if I am on the right track.

Note: I could not figure out a way to catch if the model was in eval mode. How would you have done it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the dropout check is valid as is. Regarding eval mode, I think that checking self.training should work.

On how to proceed, my thinking was that if we find that we can make this optimization, we pass the base result as an additional argument to DoRA forward (default for that argument being None) and there, we use this base result if it's given and if not, we calculate it like we currently do. Could be that I'm missing something but that's my idea.

The good news is that since we have a working implementation, we can then compare the results using both approaches and it should be identical (of course not when there is dropout, but apart from that).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat solution!

x = dropout(x)
result = result + self.lora_magnitude_vector[active_adapter](
x,
Expand Down
Loading