`test_autocast_torch_bf16` fails if PyTorch is compiled with CUDA support. #6085

ysiraichi · 2023-12-09T15:10:56Z

🐛 Bug

Running test_autocast_torch_bf16 test produces the following error, if PyTorch was compiled with CUDA support:

$ python test/test_autocast.py -v -k test_autocast_torch_bf16
test_autocast_torch_bf16 (__main__.TestAutocastCuda) ... ERROR

======================================================================
ERROR: test_autocast_torch_bf16 (__main__.TestAutocastCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_autocast.py", line 391, in test_autocast_torch_bf16
    self._run_autocast_outofplace(
  File "test/test_autocast.py", line 278, in _run_autocast_outofplace
    with autocast(xm.xla_device(), dtype=autocast_dtype):
  File "xla/torch_xla/amp/autocast_mode.py", line 45, in __init__
    super().__init__(
  File "torch/amp/autocast_mode.py", line 306, in __init__
    raise RuntimeError(
RuntimeError: Current CUDA Device does not support bfloat16. Please switch dtype to float16.

----------------------------------------------------------------------
Ran 1 test in 0.140s

FAILED (errors=1)

Environment

PyTorch/XLA: 402166b

Additional Context

Blocking: #6070

The text was updated successfully, but these errors were encountered:

ysiraichi · 2023-12-09T15:21:03Z

The main problem here is that torch.cuda.is_bf16_supported() returns false, while torch.tensor([1.], dtype=torch.bfloat16, device=xm.xla_device()) works.

JackCaoG · 2023-12-11T18:48:16Z

@yeounoh FYI

…#115924) Fix: #115900 pytorch/xla#6085 This PR adds a last resort for testing for BF16 support on CUDA. This is necessary on GPUs such as RTX 2060, where `torch.cuda.is_bf_supported()` returns False, but we can successfully create a BF16 tensor on CUDA. Before this PR: ```python >>> torch.cuda.is_bf_supported() False >>> torch.tensor([1.], dtype=torch.bfloat16, device="cuda") tensor([...], device='cuda:0', dtype=torch.bfloat16) ``` After this PR: ```python >>> torch.cuda.is_bf_supported() True >>> torch.tensor([1.], dtype=torch.bfloat16, device="cuda") tensor([...], device='cuda:0', dtype=torch.bfloat16) ``` Pull Request resolved: #115924 Approved by: https://github.com/jansel

lezcano added the xla:gpu label Dec 11, 2023

ysiraichi mentioned this issue Dec 11, 2023

Failing Torchbench Models: tracking issue #5932

Open

This was referenced Dec 15, 2023

Can't run autocast on XLA, if PyTorch was compiled with CUDA support. pytorch/pytorch#115900

Closed

Try creating a bf16 tensor as a last resort of is_bf16_supported(). pytorch/pytorch#115924

Closed

ysiraichi linked a pull request Jan 11, 2024 that will close this issue

Try creating a bf16 tensor as a last resort of is_bf16_supported(). pytorch/pytorch#115924

Closed

ysiraichi closed this as completed Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`test_autocast_torch_bf16` fails if PyTorch is compiled with CUDA support. #6085

`test_autocast_torch_bf16` fails if PyTorch is compiled with CUDA support. #6085

ysiraichi commented Dec 9, 2023

ysiraichi commented Dec 9, 2023

JackCaoG commented Dec 11, 2023

test_autocast_torch_bf16 fails if PyTorch is compiled with CUDA support. #6085

test_autocast_torch_bf16 fails if PyTorch is compiled with CUDA support. #6085

Comments

ysiraichi commented Dec 9, 2023

🐛 Bug

Environment

Additional Context

ysiraichi commented Dec 9, 2023

JackCaoG commented Dec 11, 2023

`test_autocast_torch_bf16` fails if PyTorch is compiled with CUDA support. #6085

`test_autocast_torch_bf16` fails if PyTorch is compiled with CUDA support. #6085