How to modified the _runtime_utils.py #3

YuMJie · 2025-01-11T11:02:00Z

I noticed that we should modified _pre_forward as followed.

`_pre_forward` is modified, the original path is `software/miniconda3/lib/python3.11/site-packages/torch/distributed/fsdp/_runtime_utils.py`

But when I used the container you provided, it seem that the version of torch is mismatch.

For example:

from torch.distributed.fsdp._utils import (
    _apply_to_tensors,
    _no_dispatch_record_stream,
    p_assert,
)

it does not exist module name _utils in torch.distributed.fsdp.

So How could I modify the _pre_forward function

Thank.

The text was updated successfully, but these errors were encountered:

YuMJie · 2025-01-11T14:50:44Z

What is more. When using 1F1B, it occurs

Traceback (most recent call last):
  File "/workspace/flashflex/llama_train.py", line 241, in <module>
    train(model, loss_func, optimizer, args)
  File "/workspace/flashflex/llama_train.py", line 229, in train
    train_step(model, loss_func, optimizer, trainloader)
  File "/workspace/flashflex/llama_train.py", line 181, in train_step
    optimizer.step()
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 391, in wrapper
    out = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 76, in _use_grad
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/adam.py", line 165, in step
    adam(
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/adam.py", line 314, in adam
    func(params,
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/adam.py", line 520, in _multi_tensor_adam
    device_grads = torch._foreach_add(device_grads, device_params, alpha=weight_decay)
RuntimeError: The size of tensor a (147849216) must match the size of tensor b (73924608) at non-singleton dimension 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to modified the _runtime_utils.py #3

How to modified the _runtime_utils.py #3

YuMJie commented Jan 11, 2025

YuMJie commented Jan 11, 2025

How to modified the _runtime_utils.py #3

How to modified the _runtime_utils.py #3

Comments

YuMJie commented Jan 11, 2025

YuMJie commented Jan 11, 2025