Fixed optim update error with non-contiguous grads/params #1187

Edenzzzz · 2024-04-23T06:52:53Z

Fixes #1185
Non-contiguous params/gradients resulting from torch.chunk and all_gather etc. are ubiquitous in distributed training frameworks such as ZeRO. This avoids update errors as the C++ kernels assume row-major inputs.

cc @Titus-von-Koeller

matthewdouglas · 2024-07-22T14:30:38Z

@Edenzzzz Thanks for the PR! This looks good to me!
cc: @Titus-von-Koeller

Titus-von-Koeller · 2024-07-22T15:11:07Z

Just ran the optimizer test suite as well, everything passing.

Thanks @Edenzzzz and @matthewdouglas for tracking this down and providing a fix 🙌🏻 🚀, this was really helpful!

Fixed optim update error with non-contiguous grads

79adda8

Merge branch 'main' into fix_contiguous

b62becd

fix formatting

15c7b4b

Titus-von-Koeller merged commit a3f55ce into bitsandbytes-foundation:main Jul 22, 2024
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed optim update error with non-contiguous grads/params #1187

Fixed optim update error with non-contiguous grads/params #1187

Edenzzzz commented Apr 23, 2024 •

edited

Loading

matthewdouglas commented Jul 22, 2024

Titus-von-Koeller commented Jul 22, 2024

Fixed optim update error with non-contiguous grads/params #1187

Fixed optim update error with non-contiguous grads/params #1187

Conversation

Edenzzzz commented Apr 23, 2024 • edited Loading

matthewdouglas commented Jul 22, 2024

Titus-von-Koeller commented Jul 22, 2024

Edenzzzz commented Apr 23, 2024 •

edited

Loading