Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multiple LoRAs in batched inference scenario #903

Closed
wants to merge 13 commits into from

Conversation

pacman100
Copy link
Contributor

@pacman100 pacman100 commented Sep 5, 2023

What does this PR do?

  1. Support multiple LoRAs in batched inference setting. Here, we create sub-batches via logic groupby(adapter_name).

How to use:

[To Do]

ToDos:

  • Tests
  • Example Notebook
  • Documentation
  • Support all the layers types such as Conv, Embedding, Quantized 8-bit and 4-bit layers.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@BenjaminBossan
Copy link
Member

Thank you for tackling this Sourab. I think this is a feature that many users would find useful and having support for this being added should be high priority.

I'm just leaving some early comments for now, so that we can discuss the design:

I was also thinking about how to solve this problem. Especially, how to pass the information about which adapter to use for which sample to the forward method of the respective adapters. The solution presented here is that this information is part of the kwargs when calling the PeftModel/LoraModel, then it is immediately removed from the kwargs and saved as an attribute on the adapter layer, where it is retrieved during the forward call.

I'm not a big fan of this approach, as it works via side-effects. This makes it hard to reason about and more difficult debug. Furthermore, it requires a lot of care to handle correctly. As an example, say a user calls generate with adapter_indices, then they are set as an attribute. Let's assume that somewhere in the call, there is an error. The adapter_indices won't be cleaned up. Next time the user calls without adapter_indices, the attribute is still set, thus they get incorrect outputs without any indication that something went wrong. So at the very least, we have to take more care of doing the clean up correctly.

Another potential issue is that at the moment, adapter_indices is cleaned up at the end of the forward call of the adapter layer. But what happens if a single forward or generate call on the LoraModel/PeftModel requires multiple forward calls for each layer? Then only the first call is correct and the subsequent ones ignore the information.

Overall, this approach looks very brittle to me and if possible, I would like to find a better approach.

I assume that you also considered just passing down the kwargs with the adapter_indices but that this wouldn't work, because transformers/some model architectures cannot handle arbitrary arguments being added. So this more straightforward way is not possible.

An alternative idea could be to work with pre-forward hooks using with_kwargs=True, as they should allow us to add more arguments to the kwargs before they are passed to forward. This is still a side-effect with all the associated problems, and it would still require us to ensure that the hooks are properly cleaned up, but at least we get the handles, which only require a .remove() call to be cleaned up.

Other than that, I'm open for ideas and discussions, maybe there is a better solution we just haven't considered yet.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@pacman100 pacman100 marked this pull request as ready for review September 5, 2023 14:09
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@github-actions github-actions bot closed this Oct 22, 2023
@github-actions github-actions bot closed this Oct 31, 2023
@github-actions github-actions bot closed this Nov 9, 2023
@BenjaminBossan
Copy link
Member

No, bad bot!

Copy link

github-actions bot commented Dec 4, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@BenjaminBossan
Copy link
Member

not stale...

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@github-actions github-actions bot closed this Jan 7, 2024
@BenjaminBossan BenjaminBossan reopened this Jan 8, 2024
@github-actions github-actions bot closed this Jan 16, 2024
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Mar 13, 2024
This PR tries to revive the work by Sourab in huggingface#903. The core logic is
the same between the two PRs. This one should be more complete.

The main idea is to allow the user to mix different LoRA adapters in the
same batch. This is useful when the user wants perform inference with a
batch that uses different LoRA adapters. Without this, each batch would
have to be restricted to the same LoRA adapter(s).

This PR should encompass:

- all task types
- all LoRA layer types
- bnb layers

Extensive tests were added, as well as documentation.
BenjaminBossan added a commit that referenced this pull request Mar 18, 2024
This PR revives the work by Sourab in #903. The core logic is
the same between the two PRs. This one should be more complete.

The main idea is to allow the user to mix different LoRA adapters in the
same batch. This is useful when the user wants perform inference with a
batch that uses different LoRA adapters. Without this, each batch would
have to be restricted to the same LoRA adapter(s).

This PR should encompass:

- all task types
- all LoRA layer types
- bnb layers

Extensive tests were added, as well as documentation.

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants