Fix `norm` data-type when using AMP. #7878

ysiraichi · 2024-08-19T14:21:01Z

This PR fixes the result of norm operation when using AMP.

The cast policy defined in autocast_mode.cpp for norm.ScalarOpt_dim is: fp32_append_dtype. Which means that it will forward the call to norm.ScalarOpt_dim_dtype, appending at::kFloat (i.e. float32) as its last argument. Such an argument represents the data-type of the result tensor.

Even though we were correctly lowering the operation so as to return a float32 tensor, upon checking, the result tensor actually inherited its data-type from the input. The solution was to call XLATensor::CreateFrom with at::kFloat argument. The example below illustrates the problem:

>>> x = torch.rand((10, 10), dtype=torch.float16, device="xla")
>>> with torch.cuda.amp.autocast(dtype=torch.float16):
        r = torch.norm(x, p=2, dim=1)

>>> r
# HLO representation shows it's returning a f32 tensor, though.
tensor(..., dtype=torch.float16)

cc @miladm @JackCaoG

ysiraichi added 2 commits August 19, 2024 11:08

Add test.

48f256b

Explicitly specify the dtype on XLATensor creation.

5e2ad5a

ysiraichi added the xla:gpu label Aug 19, 2024

ysiraichi requested a review from JackCaoG August 19, 2024 14:21

ysiraichi mentioned this pull request Aug 19, 2024

Failing Torchbench Models: tracking issue #5932

Open

JackCaoG approved these changes Aug 19, 2024

View reviewed changes

JackCaoG merged commit ac13bf2 into master Aug 19, 2024
23 checks passed

JackCaoG deleted the ysiraichi/fix-norm-amp-dtype branch August 19, 2024 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `norm` data-type when using AMP. #7878

Fix `norm` data-type when using AMP. #7878

ysiraichi commented Aug 19, 2024

Fix norm data-type when using AMP. #7878

Fix norm data-type when using AMP. #7878

Conversation

ysiraichi commented Aug 19, 2024

Fix `norm` data-type when using AMP. #7878

Fix `norm` data-type when using AMP. #7878