-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support multiple LoRAs in batched inference scenario #903
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Thank you for tackling this Sourab. I think this is a feature that many users would find useful and having support for this being added should be high priority. I'm just leaving some early comments for now, so that we can discuss the design: I was also thinking about how to solve this problem. Especially, how to pass the information about which adapter to use for which sample to the I'm not a big fan of this approach, as it works via side-effects. This makes it hard to reason about and more difficult debug. Furthermore, it requires a lot of care to handle correctly. As an example, say a user calls Another potential issue is that at the moment, Overall, this approach looks very brittle to me and if possible, I would like to find a better approach. I assume that you also considered just passing down the An alternative idea could be to work with pre-forward hooks using Other than that, I'm open for ideas and discussions, maybe there is a better solution we just haven't considered yet. |
Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
No, bad bot! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
not stale... |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
This PR tries to revive the work by Sourab in huggingface#903. The core logic is the same between the two PRs. This one should be more complete. The main idea is to allow the user to mix different LoRA adapters in the same batch. This is useful when the user wants perform inference with a batch that uses different LoRA adapters. Without this, each batch would have to be restricted to the same LoRA adapter(s). This PR should encompass: - all task types - all LoRA layer types - bnb layers Extensive tests were added, as well as documentation.
This PR revives the work by Sourab in #903. The core logic is the same between the two PRs. This one should be more complete. The main idea is to allow the user to mix different LoRA adapters in the same batch. This is useful when the user wants perform inference with a batch that uses different LoRA adapters. Without this, each batch would have to be restricted to the same LoRA adapter(s). This PR should encompass: - all task types - all LoRA layer types - bnb layers Extensive tests were added, as well as documentation. --------- Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
What does this PR do?
groupby(adapter_name)
.How to use:
[To Do]
ToDos: