ENH: Faster adapter loading if there are a lot of target modules #2045

BenjaminBossan · 2024-08-30T10:45:22Z

This is an optimization to reduce the number of entries in the target_modules set. The reason is that in some circumstances, target_modules can contain hundreds of entries. Since each target module is checked against each module of the net (which can be thousands), this can become quite expensive when many adapters are being added. Often, the target_modules can be condensed in such a case, which speeds up the process.

A context in which this can happen is when diffusers loads non-PEFT LoRAs. As there is no meta info on target_modules in that case, they are just inferred by listing all keys from the state_dict, which can be quite a lot. See: huggingface/diffusers#9297

As there is a small chance for undiscovered bugs, we apply this optimization only if the list of target_modules is sufficiently big. Therefore, for normal PEFT users, this should not have any effect.

Example:

>>> from peft.tuners.tuners_utils import _find_minimal_target_modules
>>> target_modules = [f"model.decoder.layers.{i}.self_attn.q_proj" for i in range(100)]
>>> target_modules += [f"model.decoder.layers.{i}.self_attn.v_proj" for i in range(100)]
>>> other_module_names = [f"model.encoder.layers.{i}.self_attn.k_proj" for i in range(100)]
>>> _find_minimal_target_modules(target_modules, other_module_names)
{"q_proj", "v_proj"}

As shown in huggingface/diffusers#9297, the speed improvements for loading many diffusers LoRAs can be substantial. When loading 30 adapters, the time would go up from 0.6 sec per adapter to 3 sec per adapter. With this fix, the time goes up from 0.6 sec per adapter to 1 sec per adapter.

This is an optimization to reduce the number of entries in the target_modules list. The reason is that in some circumstances, target_modules can contain hundreds of entries. Since each target module is checked against each module of the net (which can be thousands), this can become quite expensive when many adapters are being added. Often, the target_modules can be condensed in such a case, which speeds up the process. A context in which this can happen is when diffusers loads non-PEFT LoRAs. As there is no meta info on target_modules in that case, they are just inferred by listing all keys from the state_dict, which can be quite a lot. See: huggingface/diffusers#9297 As there is a small chance for undiscovered bugs, we apply this optimization only if the list of target_modules is sufficiently big.

HuggingFaceDocBuilderDev · 2024-08-30T10:48:45Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

Excellent stuff! Would be nice to also update the original description of this PR to hint at the speedup achievable with this change.

sayakpaul · 2024-08-30T21:22:08Z

src/peft/tuners/tuners_utils.py

+        # quite a lot. See: https://github.com/huggingface/diffusers/issues/9297
+        # As there is a small chance for undiscovered bugs, we apply this optimization only if the list of
+        # target_modules is sufficiently big.
+        if isinstance(peft_config.target_modules, (list, set)) and len(peft_config.target_modules) >= 20:


20 could be assigned to a variable.

sayakpaul · 2024-08-30T21:22:52Z

src/peft/tuners/tuners_utils.py

+        # target_modules is sufficiently big.
+        if isinstance(peft_config.target_modules, (list, set)) and len(peft_config.target_modules) >= 20:
+            names_not_match = [n for n in key_list if n not in peft_config.target_modules]
+            new_target_modules = find_minimal_target_modules(peft_config.target_modules, names_not_match)


Maybe we keep find_minimal_target_modules() as a pseudo-private method to denote the experimental nature of this feature?

sayakpaul · 2024-08-30T21:24:49Z

src/peft/tuners/tuners_utils.py

@@ -781,6 +796,86 @@ def _move_adapter_to_device_of_base_layer(self, adapter_name: str, device: Optio
                adapter_layer[adapter_name] = adapter_layer[adapter_name].to(device)


+def find_minimal_target_modules(
+    target_modules: list[str] | set[str], other_module_names: list[str] | set[str]


I would prefer this (list[str] | set[str]) to Union[List[str], Set[str]] more. Nice.

Personally, I prefer the new style that requires no from typing import List, Set, Union etc. I think this will be adopted more and more going forward, as old Python versions that don't support it are phased out, so I'd rather keep it like this.

Yeah same here.

BenjaminBossan

Thanks for the review, your points should be addressed, please check again. Note that I made some small changes on top, like better variable names and adding another test.

BenjaminBossan · 2024-09-02T10:06:59Z

src/peft/tuners/tuners_utils.py

@@ -781,6 +796,86 @@ def _move_adapter_to_device_of_base_layer(self, adapter_name: str, device: Optio
                adapter_layer[adapter_name] = adapter_layer[adapter_name].to(device)


+def find_minimal_target_modules(
+    target_modules: list[str] | set[str], other_module_names: list[str] | set[str]


Personally, I prefer the new style that requires no from typing import List, Set, Union etc. I think this will be adopted more and more going forward, as old Python versions that don't support it are phased out, so I'd rather keep it like this.

Make style

60d4616

BenjaminBossan requested a review from sayakpaul August 30, 2024 12:04

BenjaminBossan mentioned this pull request Aug 30, 2024

load multiple LORAs, and the load time increases linearly huggingface/diffusers#9297

Closed

sayakpaul approved these changes Aug 30, 2024

View reviewed changes

BenjaminBossan added 2 commits September 2, 2024 11:35

Make find_minimal_target_modules function private

4d545bb

Smaller fixes, better var names, more testing

e53e08c

BenjaminBossan commented Sep 2, 2024

View reviewed changes

BenjaminBossan merged commit 01275b4 into huggingface:main Sep 2, 2024
14 checks passed

BenjaminBossan deleted the enh-speed-up-adapter-loading-many-target-modules branch September 2, 2024 10:59

BenjaminBossan mentioned this pull request Sep 20, 2024

FIX: Bug in find_minimal_target_modules #2083

Merged

BenjaminBossan mentioned this pull request Oct 10, 2024

[LoRA] log a warning when there are missing keys in the LoRA loading. huggingface/diffusers#9622

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Faster adapter loading if there are a lot of target modules #2045

ENH: Faster adapter loading if there are a lot of target modules #2045

BenjaminBossan commented Aug 30, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 30, 2024

sayakpaul left a comment

sayakpaul Aug 30, 2024

sayakpaul Aug 30, 2024

sayakpaul Aug 30, 2024

BenjaminBossan Sep 2, 2024

sayakpaul Sep 2, 2024

BenjaminBossan left a comment

BenjaminBossan Sep 2, 2024

ENH: Faster adapter loading if there are a lot of target modules #2045

ENH: Faster adapter loading if there are a lot of target modules #2045

Conversation

BenjaminBossan commented Aug 30, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Aug 30, 2024

sayakpaul left a comment

Choose a reason for hiding this comment

sayakpaul Aug 30, 2024

Choose a reason for hiding this comment

sayakpaul Aug 30, 2024

Choose a reason for hiding this comment

sayakpaul Aug 30, 2024

Choose a reason for hiding this comment

BenjaminBossan Sep 2, 2024

Choose a reason for hiding this comment

sayakpaul Sep 2, 2024

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Sep 2, 2024

Choose a reason for hiding this comment

BenjaminBossan commented Aug 30, 2024 •

edited

Loading