DistributedDataParallel error - uninitialized parameters #644

OhioT · 2024-08-05T11:40:12Z

I'm using Flux quickstart settings with fp8 quantization on 4x3090s. The same settings work on 1x3090.

TRAINING_NUM_PROCESSES=2
export ACCELERATE_EXTRA_ARGS="--multi_gpu"

on line: results = accelerator.prepare(primary_model
RuntimeError: Modules with uninitialized parameters can't be used with DistributedDataParallel. Run a dummy forward pass to correctly initialize the modules

I have tried the DDP argument find_unused_parameters=True and printing modules with requires_grad = True and grad = None, but there aren't any.

The text was updated successfully, but these errors were encountered:

bghira · 2024-08-05T11:42:58Z

oh... well.. actually, i haven't tried multigpu quantised training yet. i assumed it would just work, since we're not really messing with a whole lot other than dtypes. @sayakpaul cc

bghira · 2024-08-05T11:43:40Z

i am guessing you can't test without quantisation to see?

sayakpaul · 2024-08-05T11:47:13Z

Run a dummy forward pass to correctly initialize the modules

Did this help or isn't it possible at all?

sayakpaul · 2024-08-05T11:47:42Z

There is a multiGPU training example with FP8 but it uses ao:
https://github.com/pytorch/ao/blob/main/benchmarks/float8/bench_multi_gpu.py

bghira · 2024-08-05T11:54:28Z

everything torch does has such a worse interface than everything hugging face does - ao looks like it will work but jesus lord why is it so ugly lol

OhioT · 2024-08-05T12:41:04Z

Run a dummy forward pass to correctly initialize the modules

Did this help or isn't it possible at all?

I tried the following and the same error happened at prepare()

tpacked_noisy_latents = torch.randn(1, 4320, 64,dtype=weight_dtype, device=accelerator.device)
tpooled_projections = torch.randn(1, 768,dtype=weight_dtype, device=accelerator.device)
ttimesteps = torch.randn(1,dtype=weight_dtype, device=accelerator.device)
tguidance = torch.randn(1,dtype=weight_dtype, device=accelerator.device)
tencoder_hidden_states = torch.randn(1, 512, 4096,dtype=weight_dtype, device=accelerator.device)
ttxt_ids = torch.randn(1, 512, 3,dtype=weight_dtype, device=accelerator.device)
timg_ids = torch.randn(1, 4320, 3,dtype=weight_dtype, device=accelerator.device)

with torch.no_grad():
    model_pred = transformer(
        hidden_states=tpacked_noisy_latents,
        timestep=ttimesteps,
        guidance=tguidance,
        pooled_projections=tpooled_projections,
        encoder_hidden_states=tencoder_hidden_states,
        txt_ids=ttxt_ids,
        img_ids=timg_ids,
        joint_attention_kwargs=None,
        return_dict=False,
    )
transformer = accelerator.prepare(transformer)

sayakpaul · 2024-08-05T12:47:10Z

Okay. This is helpful. Would you be able to turn the above into a fuller reproducer and provide your accelerate config and launch command?

Will try to look into it tomorrow.

bghira · 2024-08-18T08:02:41Z

@sayakpaul any luck?

matabear-wyx · 2024-08-20T07:23:36Z

Same error here, could you please provide some possible ideas about multi-gpu quantised training? Maybe I can try to work on it.

bghira · 2024-08-21T18:46:50Z

this doesn't happen with LORA_TYPE=lycoris and fp8-quanto on 2x 3090

bghira · 2024-08-21T19:13:14Z

@sayakpaul i got u fam

accelerate launch --multi_gpu test.py

import torch, accelerate
from diffusers import FluxTransformer2DModel
from optimum.quanto import quantize, qint8, freeze
weight_dtype = torch.bfloat16

accelerator = accelerate.Accelerator()

bfl_model = 'black-forest-labs/FLUX.1-dev'
transformer = FluxTransformer2DModel.from_pretrained(bfl_model, torch_dtype=torch.bfloat16, subfolder="transformer")

# you might need 'with accelerator.main_process_first()' if your server lacks system mem
print('quantizing')
quantize(transformer, qint8)
print('freezing')
freeze(transformer)

tpacked_noisy_latents = torch.randn(1, 1024, 64,dtype=weight_dtype, device=accelerator.device)
tpooled_projections = torch.randn(1, 768,dtype=weight_dtype, device=accelerator.device)
ttimesteps = torch.randn(1,dtype=weight_dtype, device=accelerator.device)
tguidance = torch.randn(1,dtype=weight_dtype, device=accelerator.device)
tencoder_hidden_states = torch.randn(1, 512, 4096,dtype=weight_dtype, device=accelerator.device)
ttxt_ids = torch.randn(1, 512, 3,dtype=weight_dtype, device=accelerator.device)
timg_ids = torch.randn(1, 4320, 3,dtype=weight_dtype, device=accelerator.device)

#with torch.no_grad():
#    model_pred = transformer(
#        hidden_states=tpacked_noisy_latents,
#        timestep=ttimesteps,
#        guidance=tguidance,
#        pooled_projections=tpooled_projections,
#        encoder_hidden_states=tencoder_hidden_states,
#        txt_ids=ttxt_ids,
#        img_ids=timg_ids,
#        joint_attention_kwargs=None,
#        return_dict=False,
#    )
transformer = accelerator.prepare(transformer)

bghira · 2024-08-21T19:21:12Z

same issue here,

transformer = FluxTransformer2DModel.from_pretrained(bfl_model, torch_dtype=torch.bfloat16, subfolder="transformer")
if accelerator.is_main_process:
        print('quantizing')
        quantize(transformer, qint8)
        print('freezing')
        freeze(transformer)
print('waiting..')
accelerator.wait_for_everyone()

…FT, inform user to go with lycoris instead

(#644) temporarily block training on multi-gpu setup with quanto + PEFT, inform user to go with lycoris instead

bghira · 2024-08-27T01:44:20Z

for now, DDP works with Lycoris. i will close this and eventually we will receive an upstream fix when there is time for them to focus on it again.

bghira added bug Something isn't working help wanted Extra attention is needed good first issue Good for newcomers regression This bug has regressed behaviour that previously worked. labels Aug 5, 2024

This was referenced Aug 6, 2024

Train the lora flux with 2 x 4090 #652

Closed

Modules with uninitialized parameters can't be used with `DistributedDataParallel #686

Closed

JohnTheNerd mentioned this issue Aug 11, 2024

Unable to create Flux LoRA with quanto-int8 #717

Closed

bghira added the upstream-bug We can't do anything but wait. label Aug 21, 2024

bghira pushed a commit that referenced this issue Aug 21, 2024

(#644) temporarily block training on multi-gpu setup with quanto + PE…

8e2a644

…FT, inform user to go with lycoris instead

bghira added a commit that referenced this issue Aug 21, 2024

Merge pull request #837 from bghira/main

446f3ec

(#644) temporarily block training on multi-gpu setup with quanto + PEFT, inform user to go with lycoris instead

bghira closed this as not planned Won't fix, can't repro, duplicate, stale Aug 27, 2024

bghira added wontfix This will not be worked on and removed bug Something isn't working help wanted Extra attention is needed good first issue Good for newcomers regression This bug has regressed behaviour that previously worked. labels Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DistributedDataParallel error - uninitialized parameters #644

DistributedDataParallel error - uninitialized parameters #644

OhioT commented Aug 5, 2024

bghira commented Aug 5, 2024

bghira commented Aug 5, 2024

sayakpaul commented Aug 5, 2024

sayakpaul commented Aug 5, 2024

bghira commented Aug 5, 2024

OhioT commented Aug 5, 2024 •

edited

Loading

sayakpaul commented Aug 5, 2024

bghira commented Aug 18, 2024

matabear-wyx commented Aug 20, 2024

bghira commented Aug 21, 2024

bghira commented Aug 21, 2024 •

edited

Loading

bghira commented Aug 21, 2024

bghira commented Aug 27, 2024

DistributedDataParallel error - uninitialized parameters #644

DistributedDataParallel error - uninitialized parameters #644

Comments

OhioT commented Aug 5, 2024

bghira commented Aug 5, 2024

bghira commented Aug 5, 2024

sayakpaul commented Aug 5, 2024

sayakpaul commented Aug 5, 2024

bghira commented Aug 5, 2024

OhioT commented Aug 5, 2024 • edited Loading

sayakpaul commented Aug 5, 2024

bghira commented Aug 18, 2024

matabear-wyx commented Aug 20, 2024

bghira commented Aug 21, 2024

bghira commented Aug 21, 2024 • edited Loading

bghira commented Aug 21, 2024

bghira commented Aug 27, 2024

OhioT commented Aug 5, 2024 •

edited

Loading

bghira commented Aug 21, 2024 •

edited

Loading