Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Deepspeed] add support for bf16 mode #14569

Merged
merged 34 commits into from
Mar 12, 2022
Merged

Conversation

stas00
Copy link
Contributor

@stas00 stas00 commented Nov 30, 2021

This PR:

  • adds support for bfloat16 for ZeRO-1, ZeRO-2 and ZeRO-3 to HF/DS integration.
  • most functional tests are now run for bf16 as well - so this PR almost doubles the number of tests. model zoo tests are left with fp16 for now, as both should work the same.
  • docs are updated to document the new feature

Requirements:

  1. merged Support for Training with BF16 #13207
  2. ZeRO-3 support merged Various ZeRO Stage3 Optimizations + Improvements (including bfloat16 support) microsoft/DeepSpeed#1453
  3. Need new deepspeed release after the above is merged
  4. update version dependency table

for users who want to try this early:

  1. add --bf16 to your previous fp16 or fp32 deepspeed command line

done ;)

@sgugger

@stas00 stas00 changed the title [WIP] add support for bf16 mode [WIP] [Deepspeed] add support for bf16 mode Dec 6, 2021
@huggingface huggingface deleted a comment from github-actions bot Jan 14, 2022
@stas00 stas00 added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Jan 14, 2022
@stas00 stas00 changed the title [WIP] [Deepspeed] add support for bf16 mode [Deepspeed] add support for bf16 mode Feb 11, 2022
@stas00 stas00 marked this pull request as ready for review February 11, 2022 00:27
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

docs/source/main_classes/deepspeed.mdx Outdated Show resolved Hide resolved
docs/source/main_classes/deepspeed.mdx Outdated Show resolved Hide resolved
src/transformers/deepspeed.py Outdated Show resolved Hide resolved
tests/deepspeed/test_deepspeed.py Outdated Show resolved Hide resolved
tests/deepspeed/test_deepspeed.py Outdated Show resolved Hide resolved
tests/deepspeed/test_deepspeed.py Outdated Show resolved Hide resolved
tests/deepspeed/test_deepspeed.py Outdated Show resolved Hide resolved
@stas00
Copy link
Contributor Author

stas00 commented Feb 12, 2022

Thanks a lot for catching those few development process leftovers, Sylvain! Much appreciating. All cleaned up now.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good to me!

@stas00 stas00 merged commit 580dd87 into huggingface:master Mar 12, 2022
@stas00 stas00 deleted the ds-bf16 branch March 12, 2022 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants