Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove all usage of AttnProcsLayers #4699

Open
williamberman opened this issue Aug 21, 2023 · 8 comments
Open

Remove all usage of AttnProcsLayers #4699

williamberman opened this issue Aug 21, 2023 · 8 comments
Labels
bug Something isn't working Good second issue

Comments

@williamberman
Copy link
Contributor

Describe the bug

All training scripts which use AttnProcsLayers will not work properly with accelerate for any accelerate feature that requires calling into the wrapped return class for monkey patching the forward method.

All lora training scripts should instead:

  1. the top level module must be directly passed to accelerate.prepare
  2. the parameters passed to the optimizer must be pulled out of the top level module
  3. the saving callbacks must pull the lora weights out of the top level module
  4. the load callbacks must load weights into the top level module
  5. the clipped parameters must be pulled out of the top level module
  6. validation and/or other forward code must properly deal with potentially changed dtypes due to mixed precision now being actually enabled
  7. The save of the final trained model must pull the lora weights out of the top level module

This PR fixed these bugs in the dreambooth lora script https://github.com/huggingface/diffusers/pull/3778/files but there are still 4 lora training scripts which use AttnProcsLayers

other relevant github links : #4046 #4046 (comment)

Reproduction

n/a

Logs

No response

System Info

n/a

Who can help?

No response

@williamberman williamberman added bug Something isn't working Good second issue labels Aug 21, 2023
@eliphatfs
Copy link
Contributor

In my code, I only save parameters that requires_grad. I think it is a general and elegant way of selecting trained weights among all parameters.

@sayakpaul
Copy link
Member

@eliphatfs could you maybe point @williamberman to a full-fledged script you're talking about?

@eliphatfs
Copy link
Contributor

def _save_to_state_dict(module, destination, prefix, keep_vars, trainable_only=False):
    for name, param in module._parameters.items():
        if param is not None and (not trainable_only or param.requires_grad):
            destination[prefix + name] = param if keep_vars else param.detach()
    for name, buf in module._buffers.items():
        # remove check of _non_persistent_buffers_set to allow nn.BatchNorm2d
        if buf is not None:
            destination[prefix + name] = buf if keep_vars else buf.detach()


def get_state_dict(module,
                   destination=None,
                   prefix='',
                   keep_vars=False,
                   trainable_only=False):
    # recursively check parallel module in case that the model has a
    # complicated structure, e.g., nn.Module(nn.Module(DDP))
    if is_module_wrapper(module):
        module = module.module

    # below is the same as torch.nn.Module.state_dict()
    if destination is None:
        destination = OrderedDict()
        destination._metadata = OrderedDict()  # type: ignore
    destination._metadata[prefix[:-1]] = local_metadata = dict(  # type: ignore
        version=module._version)
    _save_to_state_dict(module, destination, prefix, keep_vars, trainable_only=trainable_only)  # type: ignore
    for name, child in module._modules.items():
        if child is not None:
            get_state_dict(
                child, destination, prefix + name + '.', keep_vars=keep_vars, trainable_only=trainable_only)
    for hook in module._state_dict_hooks.values():
        hook_result = hook(module, destination, prefix, local_metadata)
        if hook_result is not None:
            destination = hook_result
    return destination  # type: ignore

@hkunzhe
Copy link

hkunzhe commented Aug 23, 2023

Describe the bug

All training scripts which use AttnProcsLayers will not work properly with accelerate for any accelerate feature that requires calling into the wrapped return class for monkey patching the forward method.

All lora training scripts should instead:

  1. the top level module must be directly passed to accelerate.prepare
  2. the parameters passed to the optimizer must be pulled out of the top level module
  3. the saving callbacks must pull the lora weights out of the top level module
  4. the load callbacks must load weights into the top level module
  5. the clipped parameters must be pulled out of the top level module
  6. validation and/or other forward code must properly deal with potentially changed dtypes due to mixed precision now being actually enabled
  7. The save of the final trained model must pull the lora weights out of the top level module

This PR fixed these bugs in the dreambooth lora script https://github.com/huggingface/diffusers/pull/3778/files but there are still 4 lora training scripts which use AttnProcsLayers

other relevant github links : #4046 #4046 (comment)

Reproduction

n/a

Logs

No response

System Info

n/a

Who can help?

No response

Is wrappingAttnProcsLayers like this comment a workaround to use accelerate ddp and amp in train_text_to_image.py?

@williamberman
Copy link
Contributor Author

@hkunzhe maybe it would but idk I really wouldn't recommend doing that

@patrickvonplaten
Copy link
Contributor

Slightly related: #4765

@pedrogengo
Copy link
Contributor

Hey! Can I work on this?

@williamberman
Copy link
Contributor Author

hey @pedrogengo yes feel free to though it might be a bit involved :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Good second issue
Projects
None yet
Development

No branches or pull requests

6 participants