Include bf16 support for TPUs and CPUs, and a better check for if a CUDA device supports BF16 #462

muellerzr · 2022-06-22T15:49:10Z

Add BF16 support for TPUs and CPUs

What does this add?

This PR includes better support on TPUs for BF16, introduces BF16 support for CPUs, and a helper to check if your GPU also happens to support bf16 if set.

Who is it for?

Users of Accelerate who want to train on BF16

Why is it needed?

Post fixing the conditional for testing fp16 in the test script, I realized that CPU's can support bfloat16 (though is it advised is debatable), and that to enable TPU bf16 in Accelerate you should set an environment variable beforehand. This PR fixes these two.

What parts of the API does this impact?

User-facing:

The user can now pass mixed_precision="bf16" and train on bf16 in modern CPUs and GPUs.

Internal structure:

Adds a is_bf16_available function that will check if we're on the CPU and a torch version > 1.10, runs torch.cuda.is_bf16_available() if on the GPU, and will return whether the TPU should use BF16 or not as a param (useful for testing or opting to not run on bf16).

Internally sets the XLA_USE_BF16 env variable in AcceleratorState based on if we're using BF16 or not.

Basic Usage Example(s):

# When training on the CPU, TPU, or GPU
accelerator = Accelerate(mixed_precision="bf16")

If your GPU is not supported, it will raise an error stating so.

When would I use it, and when wouldn't I?

When wanting to train on bf16 on CPU, GPU, and TPU.

Does a similar feature exist? If so, why is this better?

For TPU, to use bf16 you set XLA_USE_BF16=1 to do so. We do this automatically for you.

HuggingFaceDocBuilderDev · 2022-06-22T15:52:07Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for all the fixes in this PR, plus the new BF16 support!

sgugger · 2022-06-22T16:00:59Z

src/accelerate/accelerator.py

+            elif self.mixed_precision == "bf16" and is_bf16_available():
+                if self.distributed_type in [DistributedType.NO, DistributedType.MULTI_CPU, DistributedType.MULTI_GPU]:
+                    device_type = "cpu" if not torch.cuda.is_available() else "cuda"
+                    autocast_context = torch.autocast(dtype=torch.bfloat16, device_type=device_type)


Need to be extra sure that this always exists for PyTorch version for which is_bf16_available()

Fixed by adding a torch check for >= 1.10

src/accelerate/launchers.py

src/accelerate/state.py

sgugger · 2022-06-22T16:07:39Z

src/accelerate/test_utils/scripts/test_script.py

+        # TEST that previous fp16 flag still works
+        print("Legacy FP16 training check.")


Not sure we need to keep this. We have done a couple of releases since we deprecated it, so it's okay if we stop testing it IMO.

I'd feel more comfortable dropping the test once we've removed entirely the legacy param (whenever that may be)

muellerzr added 4 commits June 22, 2022 10:44

Support bf16 on TPU and CPU

0ab016c

Precision

c18f309

Passing on CPU, need to verify TPU and GPU

1f647b0

Move env variable setting

15f9a06

muellerzr added enhancement New feature or request CPU Bug or feature on CPU or MultiCPU platforms GPU Bug or feature on GPU or MultiGPU platforms TPU Bug or feature on TPU platforms labels Jun 22, 2022

muellerzr requested a review from sgugger June 22, 2022 15:49

sgugger approved these changes Jun 22, 2022

View reviewed changes

muellerzr added 5 commits June 22, 2022 12:15

Torch version based on transformers

0dc916a

Refactor conditional

c0388b3

Launchers

2a69fb6

One-liner

106b105

Make just enabling

f6fc60a

sgugger approved these changes Jun 22, 2022

View reviewed changes

muellerzr merged commit f13c59f into main Jun 22, 2022

muellerzr deleted the bf16 branch June 22, 2022 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include bf16 support for TPUs and CPUs, and a better check for if a CUDA device supports BF16 #462

Include bf16 support for TPUs and CPUs, and a better check for if a CUDA device supports BF16 #462

muellerzr commented Jun 22, 2022

HuggingFaceDocBuilderDev commented Jun 22, 2022 •

edited

Loading

sgugger left a comment

sgugger Jun 22, 2022

muellerzr Jun 22, 2022

sgugger Jun 22, 2022

muellerzr Jun 22, 2022

		# TEST that previous fp16 flag still works
		print("Legacy FP16 training check.")

Include bf16 support for TPUs and CPUs, and a better check for if a CUDA device supports BF16 #462

Include bf16 support for TPUs and CPUs, and a better check for if a CUDA device supports BF16 #462

Conversation

muellerzr commented Jun 22, 2022

Add BF16 support for TPUs and CPUs

What does this add?

Who is it for?

Why is it needed?

What parts of the API does this impact?

User-facing:

Internal structure:

Basic Usage Example(s):

When would I use it, and when wouldn't I?

Does a similar feature exist? If so, why is this better?

HuggingFaceDocBuilderDev commented Jun 22, 2022 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

sgugger Jun 22, 2022

Choose a reason for hiding this comment

muellerzr Jun 22, 2022

Choose a reason for hiding this comment

sgugger Jun 22, 2022

Choose a reason for hiding this comment

muellerzr Jun 22, 2022

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 22, 2022 •

edited

Loading