-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Faster adapter loading if there are a lot of target modules #2045
ENH: Faster adapter loading if there are a lot of target modules #2045
Conversation
This is an optimization to reduce the number of entries in the target_modules list. The reason is that in some circumstances, target_modules can contain hundreds of entries. Since each target module is checked against each module of the net (which can be thousands), this can become quite expensive when many adapters are being added. Often, the target_modules can be condensed in such a case, which speeds up the process. A context in which this can happen is when diffusers loads non-PEFT LoRAs. As there is no meta info on target_modules in that case, they are just inferred by listing all keys from the state_dict, which can be quite a lot. See: huggingface/diffusers#9297 As there is a small chance for undiscovered bugs, we apply this optimization only if the list of target_modules is sufficiently big.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent stuff! Would be nice to also update the original description of this PR to hint at the speedup achievable with this change.
src/peft/tuners/tuners_utils.py
Outdated
# quite a lot. See: https://github.com/huggingface/diffusers/issues/9297 | ||
# As there is a small chance for undiscovered bugs, we apply this optimization only if the list of | ||
# target_modules is sufficiently big. | ||
if isinstance(peft_config.target_modules, (list, set)) and len(peft_config.target_modules) >= 20: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20
could be assigned to a variable.
src/peft/tuners/tuners_utils.py
Outdated
# target_modules is sufficiently big. | ||
if isinstance(peft_config.target_modules, (list, set)) and len(peft_config.target_modules) >= 20: | ||
names_not_match = [n for n in key_list if n not in peft_config.target_modules] | ||
new_target_modules = find_minimal_target_modules(peft_config.target_modules, names_not_match) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we keep find_minimal_target_modules()
as a pseudo-private method to denote the experimental nature of this feature?
@@ -781,6 +796,86 @@ def _move_adapter_to_device_of_base_layer(self, adapter_name: str, device: Optio | |||
adapter_layer[adapter_name] = adapter_layer[adapter_name].to(device) | |||
|
|||
|
|||
def find_minimal_target_modules( | |||
target_modules: list[str] | set[str], other_module_names: list[str] | set[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer this (list[str] | set[str]
) to Union[List[str], Set[str]]
more. Nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I prefer the new style that requires no from typing import List, Set, Union
etc. I think this will be adopted more and more going forward, as old Python versions that don't support it are phased out, so I'd rather keep it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, your points should be addressed, please check again. Note that I made some small changes on top, like better variable names and adding another test.
@@ -781,6 +796,86 @@ def _move_adapter_to_device_of_base_layer(self, adapter_name: str, device: Optio | |||
adapter_layer[adapter_name] = adapter_layer[adapter_name].to(device) | |||
|
|||
|
|||
def find_minimal_target_modules( | |||
target_modules: list[str] | set[str], other_module_names: list[str] | set[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I prefer the new style that requires no from typing import List, Set, Union
etc. I think this will be adopted more and more going forward, as old Python versions that don't support it are phased out, so I'd rather keep it like this.
This is an optimization to reduce the number of entries in the
target_modules
set. The reason is that in some circumstances,target_modules
can contain hundreds of entries. Since each target module is checked against each module of the net (which can be thousands), this can become quite expensive when many adapters are being added. Often, thetarget_modules
can be condensed in such a case, which speeds up the process.A context in which this can happen is when diffusers loads non-PEFT LoRAs. As there is no meta info on target_modules in that case, they are just inferred by listing all keys from the state_dict, which can be quite a lot. See: huggingface/diffusers#9297
As there is a small chance for undiscovered bugs, we apply this optimization only if the list of
target_modules
is sufficiently big. Therefore, for normal PEFT users, this should not have any effect.Example:
As shown in huggingface/diffusers#9297, the speed improvements for loading many diffusers LoRAs can be substantial. When loading 30 adapters, the time would go up from 0.6 sec per adapter to 3 sec per adapter. With this fix, the time goes up from 0.6 sec per adapter to 1 sec per adapter.