Remove all usage of AttnProcsLayers #4699

williamberman · 2023-08-21T18:40:32Z

Describe the bug

All training scripts which use AttnProcsLayers will not work properly with accelerate for any accelerate feature that requires calling into the wrapped return class for monkey patching the forward method.

All lora training scripts should instead:

the top level module must be directly passed to accelerate.prepare
the parameters passed to the optimizer must be pulled out of the top level module
the saving callbacks must pull the lora weights out of the top level module
the load callbacks must load weights into the top level module
the clipped parameters must be pulled out of the top level module
validation and/or other forward code must properly deal with potentially changed dtypes due to mixed precision now being actually enabled
The save of the final trained model must pull the lora weights out of the top level module

This PR fixed these bugs in the dreambooth lora script https://github.com/huggingface/diffusers/pull/3778/files but there are still 4 lora training scripts which use AttnProcsLayers

other relevant github links : #4046 #4046 (comment)

Reproduction

n/a

Logs

No response

System Info

n/a

Who can help?

No response

eliphatfs · 2023-08-21T20:24:41Z

In my code, I only save parameters that requires_grad. I think it is a general and elegant way of selecting trained weights among all parameters.

sayakpaul · 2023-08-22T01:52:09Z

@eliphatfs could you maybe point @williamberman to a full-fledged script you're talking about?

eliphatfs · 2023-08-22T17:19:15Z

def _save_to_state_dict(module, destination, prefix, keep_vars, trainable_only=False):
    for name, param in module._parameters.items():
        if param is not None and (not trainable_only or param.requires_grad):
            destination[prefix + name] = param if keep_vars else param.detach()
    for name, buf in module._buffers.items():
        # remove check of _non_persistent_buffers_set to allow nn.BatchNorm2d
        if buf is not None:
            destination[prefix + name] = buf if keep_vars else buf.detach()


def get_state_dict(module,
                   destination=None,
                   prefix='',
                   keep_vars=False,
                   trainable_only=False):
    # recursively check parallel module in case that the model has a
    # complicated structure, e.g., nn.Module(nn.Module(DDP))
    if is_module_wrapper(module):
        module = module.module

    # below is the same as torch.nn.Module.state_dict()
    if destination is None:
        destination = OrderedDict()
        destination._metadata = OrderedDict()  # type: ignore
    destination._metadata[prefix[:-1]] = local_metadata = dict(  # type: ignore
        version=module._version)
    _save_to_state_dict(module, destination, prefix, keep_vars, trainable_only=trainable_only)  # type: ignore
    for name, child in module._modules.items():
        if child is not None:
            get_state_dict(
                child, destination, prefix + name + '.', keep_vars=keep_vars, trainable_only=trainable_only)
    for hook in module._state_dict_hooks.values():
        hook_result = hook(module, destination, prefix, local_metadata)
        if hook_result is not None:
            destination = hook_result
    return destination  # type: ignore

hkunzhe · 2023-08-23T04:14:28Z

Describe the bug

All training scripts which use AttnProcsLayers will not work properly with accelerate for any accelerate feature that requires calling into the wrapped return class for monkey patching the forward method.

All lora training scripts should instead:

the top level module must be directly passed to accelerate.prepare

the parameters passed to the optimizer must be pulled out of the top level module

the saving callbacks must pull the lora weights out of the top level module

the load callbacks must load weights into the top level module

the clipped parameters must be pulled out of the top level module

validation and/or other forward code must properly deal with potentially changed dtypes due to mixed precision now being actually enabled

The save of the final trained model must pull the lora weights out of the top level module

This PR fixed these bugs in the dreambooth lora script https://github.com/huggingface/diffusers/pull/3778/files but there are still 4 lora training scripts which use AttnProcsLayers

other relevant github links : #4046 #4046 (comment)

Reproduction

n/a

Logs

No response

System Info

n/a

Who can help?

No response

Is wrappingAttnProcsLayers like this comment a workaround to use accelerate ddp and amp in train_text_to_image.py?

williamberman · 2023-08-23T17:48:31Z

@hkunzhe maybe it would but idk I really wouldn't recommend doing that

patrickvonplaten · 2023-08-25T17:34:44Z

Slightly related: #4765

pedrogengo · 2023-08-28T19:17:38Z

Hey! Can I work on this?

williamberman · 2023-08-28T22:16:33Z

hey @pedrogengo yes feel free to though it might be a bit involved :)

williamberman added bug Something isn't working Good second issue labels Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove all usage of AttnProcsLayers #4699

Remove all usage of AttnProcsLayers #4699

williamberman commented Aug 21, 2023

eliphatfs commented Aug 21, 2023

sayakpaul commented Aug 22, 2023

eliphatfs commented Aug 22, 2023

hkunzhe commented Aug 23, 2023

Describe the bug

Reproduction

Logs

System Info

Who can help?

williamberman commented Aug 23, 2023

patrickvonplaten commented Aug 25, 2023

pedrogengo commented Aug 28, 2023

williamberman commented Aug 28, 2023

Remove all usage of AttnProcsLayers #4699

Remove all usage of AttnProcsLayers #4699

Comments

williamberman commented Aug 21, 2023

Describe the bug

Reproduction

Logs

System Info

Who can help?

eliphatfs commented Aug 21, 2023

sayakpaul commented Aug 22, 2023

eliphatfs commented Aug 22, 2023

hkunzhe commented Aug 23, 2023

Describe the bug

Reproduction

Logs

System Info

Who can help?

williamberman commented Aug 23, 2023

patrickvonplaten commented Aug 25, 2023

pedrogengo commented Aug 28, 2023

williamberman commented Aug 28, 2023