Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bf16+pipeline parallelism #1801

Merged
merged 52 commits into from
Apr 19, 2022
Merged

bf16+pipeline parallelism #1801

merged 52 commits into from
Apr 19, 2022

Conversation

tjruwase
Copy link
Contributor

@tjruwase tjruwase commented Mar 1, 2022

bf16_optimizer implementing optimizer state sharding (a.k.a., zero stage 1)
Integration with pipeline parallelism

@tjruwase
Copy link
Contributor Author

tjruwase commented Mar 1, 2022

@stas00, FYI

deepspeed/runtime/bf16_optimizer.py Show resolved Hide resolved
deepspeed/runtime/bf16_optimizer.py Outdated Show resolved Hide resolved
deepspeed/runtime/bf16_optimizer.py Show resolved Hide resolved
@@ -981,6 +969,10 @@ def _configure_distributed_model(self, model):
hasattr(param,
'ds_id') for param in self.module.parameters()):
self.__check_params(self.module, torch.bfloat16)
if self.zero_optimization_stage() == 0 and not self.pipeline_parallelism:
raise NotImplementedError(
"When not running ZeRO, BF16 training support is only supported for Pipeline parallelism"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, I wonder why BF16 is only supported for Pipeline parallelism or ZeRO 1 to ZeRO3, since there is not such limit in prior version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kisseternity, apologies for the confusion here. This is a new bf16+Pipeline parallelism code path that was written in the last minute for BLOOM model training. The existing restrictions in combining with ZeRO are temporary. We plan to harmonize these combinations and eliminate the confusions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kisseternity, apologies for the confusion here. This is a new bf16+Pipeline parallelism code path that was written in the last minute for BLOOM model training. The existing restrictions in combining with ZeRO are temporary. We plan to harmonize these combinations and eliminate the confusions.

Thanks for replying. In that case, I think bf16 can be used without the bf16 optimizer or ZeRO as before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these changes do not affect the previous support for bf16+ZeRO.

@mrwyattii mrwyattii deleted the olruwase/bf16-updates branch July 7, 2023 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants