Issues setting trainable parameters in LoRA adapter #2217

anferico · 2024-11-17T12:28:56Z

anferico
Nov 17, 2024

Hi all, consider the following model:

import torch
from peft import LoraConfig
from transformers import PretrainedConfig, PreTrainedModel


class FooConfig(PretrainedConfig):
    model_type: str = "foo"

    def __init__(self, input_size: int, **kwargs):
        super().__init__(**kwargs)
        self.input_size = input_size

class FooPreTrainedModel(PreTrainedModel):
    config_class = FooConfig
    base_model_prefix = "model"

    def _init_weights(self, module):
        if isinstance(module, torch.nn.Linear):
            module.weight.data.normal_(mean=0.0, std=0.02)
        elif isinstance(module, torch.nn.Parameter):
            module.data.normal_(mean=0.0, std=0.01)

class FooComposite(torch.nn.Module):
    def __init__(self, config: FooConfig):
        super().__init__()
        self.linear1 = torch.nn.Linear(config.input_size, config.hidden_size)
        self.linear2 = torch.nn.Linear(config.hidden_size, config.hidden_size)
        self.other_param = torch.nn.Parameter(torch.empty(config.hidden_size))

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.linear2(self.linear1(x) + self.other_param)

class FooModel(FooPreTrainedModel):
    def __init__(self, config: FooConfig):
        super().__init__(config)
        self.composite = FooComposite(config)
        self.linear3 = torch.nn.Linear(config.hidden_size, config.hidden_size)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.linear3(self.composite(x))

Suppose I want to create a LoRA adapter with the following properties:

targets FooModel.linear3 only
includes FooModel.composite.linear2 in modules_to_save (i.e. leaves it trainable)
does NOT include FooModel.composite.linear1 in modules_to_save (i.e. freezes it)
leaves FooModel.composite.other_param as trainable

how am I supposed to define my LoraConfig to achieve that? I tried different things, but every one has issues:

 lora_config = LoraConfig(
     ...,
     target_modules=["linear3"],
     modules_to_save=["composite"],
     exclude_modules=["composite.linear1"]
 )

issue: "composite.linear1" is still included in modules_to_save

 lora_config = LoraConfig(
     ...,
     target_modules=["linear3"],
     modules_to_save=["composite.linear2", "composite.other_param"]
 )

issue: "composite.other_param" is NOT trainable, probably because it's a nn.Parameter and not a nn.Module

NOTE: I know one solution would be to wrap "other_param" in a nn.Module, but please assume FooModel is an existing implementation which I cannot change.

this is the script I'm using to check which parameters are trainable after creating the adapter:

config = FooConfig(input_size=8, hidden_size=16)
model = FooModel(config)

lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["linear3"],
    modules_to_save=...,
    exclude_modules=...
)
peft_model = get_peft_model(
    model,
    peft_config=lora_config,
    adapter_name="dummy",
)
peft_model.set_adapter("dummy")

_ = [print(p.requires_grad, name) for name, p in peft_model.named_parameters()]

Answered by BenjaminBossan

Nov 18, 2024

Okay, so IIUC, the core of the problem is that you would like to treat other_param the same as the other modules_to_save but it doesn't work as modules_to_save can only target modules but not parameters.

In theory, you can just manually set the requires_grad argument on this parameter, e.g. model.base_model.model.other_param.requires_grad = True. Then this parameter should train. However, when you then call model.save_pretrained, it will not be included in the checkpoint. Therefore, you would have to save it separately and also load it separately.

The reason why we cannot simply allow modules_to_save to work on nn.Parameters is that we cannot control how they're being used by the model. W…

View full answer

BenjaminBossan · 2024-11-18T15:23:43Z

BenjaminBossan
Nov 18, 2024
Maintainer

Okay, so IIUC, the core of the problem is that you would like to treat other_param the same as the other modules_to_save but it doesn't work as modules_to_save can only target modules but not parameters.

In theory, you can just manually set the requires_grad argument on this parameter, e.g. model.base_model.model.other_param.requires_grad = True. Then this parameter should train. However, when you then call model.save_pretrained, it will not be included in the checkpoint. Therefore, you would have to save it separately and also load it separately.

The reason why we cannot simply allow modules_to_save to work on nn.Parameters is that we cannot control how they're being used by the model. With modules, we can change how they're used by overriding forward, but with parameters it doesn't work. Therefore, our hands are bound here. The only thing we could potentially do is to add support for FooComposite explicitly, thus taking control over how other_param is used. But of course this depends on what FooComposite is exactly.

3 replies

anferico Nov 19, 2024
Author

just for me to understand, why does forward need to be overridden for modules in modules_to_save? isn't it enough to just freeze the module's parameters and call the original forward?

BenjaminBossan Nov 19, 2024
Maintainer

modules_to_save is a bit more complicated than that. We create a copy of the original module and update that copy. The original weights stay frozen. The wrapper class needs to handle the original and the copy, switch between them if the user asks, etc. Therefore, we need to adjust the forward method.

The code can be found here:

peft/src/peft/utils/other.py

Lines 351 to 359 in 8874ab5

    
           def forward(self, x: torch.Tensor, *args, **kwargs): 
        
               self._check_forward_args(x, *args, **kwargs) 
        
               adapter_names = kwargs.pop("adapter_names", None) 
        
               if self.disable_adapters or (self.active_adapter not in self.modules_to_save): 
        
                   return self.original_module(x, *args, **kwargs) 
        
               if adapter_names is None: 
        
                   return self.modules_to_save[self.active_adapter](x, *args, **kwargs) 
        
               return self._mixed_batch_forward(x, *args, adapter_names=adapter_names, **kwargs)

anferico Dec 4, 2024
Author

@BenjaminBossan I see, thanks for the clarification!

My solution for now is to include the whole composite submodule in modules_to_save if and only if it has "loose" trainable parameters (=not associated with any submodule, like other_param in this case), even if this may cause some frozen parameters (linear1's parameters in this case) to be included in the final checkpoint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues setting trainable parameters in LoRA adapter #2217

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Issues setting trainable parameters in LoRA adapter #2217

anferico Nov 17, 2024

Replies: 1 comment · 3 replies

BenjaminBossan Nov 18, 2024 Maintainer

anferico Nov 19, 2024 Author

BenjaminBossan Nov 19, 2024 Maintainer

anferico Dec 4, 2024 Author

anferico
Nov 17, 2024

Replies: 1 comment 3 replies

BenjaminBossan
Nov 18, 2024
Maintainer

anferico Nov 19, 2024
Author

BenjaminBossan Nov 19, 2024
Maintainer

anferico Dec 4, 2024
Author