[benchmarks] Fix AMP setup for torchbench models. #7067

ysiraichi · 2024-05-15T20:17:32Z

Fix: #6556 (and, possibly #6833)

This PR fixes the benchmarks script when running with AMP. Previously, we were calling torch.amp.autocast(..., device_type="xla") for both XLA:CUDA and XLA:TPU. However, we should be using torch.cuda.amp.autocast for XLA:CUDA (see this for more details).

Context: after #6518, Super_Slomo inference started being run using AMP. However, due to #6511, that PR tried to mimic torch_xla.amp.autocast behavior, using torch.amp.autocast.

cc @miladm @JackCaoG @vanbasten23 @zpcore

ysiraichi · 2024-05-15T20:59:36Z

Confirmed it also fixes #6833.

JackCaoG · 2024-05-15T21:02:38Z

Hmm according to https://github.com/pytorch/xla/blob/master/docs/amp.md we should be able to use autocast for both TPU and GPU, is that no longer the case?

ysiraichi · 2024-05-15T21:49:05Z

That document is correct. Problem is that I didn't notice XLA:CUDA is supposed to run with CUDA autocast, i.e. torch.amp.autocast("cuda"). Instead, I was running both XLA:CUDA and XLA:TPU with XLA autocast, i.e. torch.amp.autocast("xla"). This behavior is implemented in torch_xla.amp.autocast. However, since it currently doesn't work (#6511) with dynamo, I was using torch.amp.autocast directly.

vanbasten23 · 2024-05-16T16:37:23Z

benchmarks/torchbench_model.py

+      # https://github.com/pytorch/xla/issues/6511
+      if self.is_accelerator_cuda():
+        # For inductor and XLA:CUDA, we use CUDA autocast.
+        autocast = torch.cuda.amp.autocast


I guess torch.cuda.amp.autocast is the same as torch.amp.autocast("cuda")?

vanbasten23 · 2024-05-16T16:38:42Z

benchmarks/torchbench_model.py

+      # https://github.com/pytorch/xla/issues/6511
+      if self.is_accelerator_cuda():
+        # For inductor and XLA:CUDA, we use CUDA autocast.
+        autocast = torch.cuda.amp.autocast


do you need to set kwargs["device_type"] = "xla" for XLA:GPU case?

Not really. torch.cuda.amp.autocast already does that.

ysiraichi added 2 commits May 15, 2024 11:27

Fix AMP setup.

d7d228b

Better comments.

9f2ce8a

ysiraichi added the xla:gpu label May 15, 2024

ysiraichi added 2 commits May 15, 2024 17:19

Fix typos.

bf0b41a

Fix lint issues.

c07b2a1

ysiraichi requested a review from vanbasten23 May 15, 2024 20:59

JackCaoG approved these changes May 16, 2024

View reviewed changes

JackCaoG merged commit aeed89e into master May 16, 2024
20 checks passed

vanbasten23 reviewed May 16, 2024

View reviewed changes

ysiraichi mentioned this pull request May 20, 2024

Failing Torchbench Models: tracking issue #5932

Open

This was linked to issues May 20, 2024

[torchbench] timm_nfnet training fails to run on AMP precision. #6649

Closed

[torchbench] detectron2_fcos_r_50_fpn fails to run inference. #6833

Closed

This was referenced May 20, 2024

[torchbench] timm_efficientdet training failing on non-dynamo. #7083

Closed

[torchbench] timm_nfnet training failing on non-dynamo. #7084

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmarks] Fix AMP setup for torchbench models. #7067

[benchmarks] Fix AMP setup for torchbench models. #7067

ysiraichi commented May 15, 2024

ysiraichi commented May 15, 2024

JackCaoG commented May 15, 2024

ysiraichi commented May 15, 2024

vanbasten23 May 16, 2024

vanbasten23 May 16, 2024

ysiraichi May 16, 2024

[benchmarks] Fix AMP setup for torchbench models. #7067

[benchmarks] Fix AMP setup for torchbench models. #7067

Conversation

ysiraichi commented May 15, 2024

ysiraichi commented May 15, 2024

JackCaoG commented May 15, 2024

ysiraichi commented May 15, 2024

vanbasten23 May 16, 2024

Choose a reason for hiding this comment

vanbasten23 May 16, 2024

Choose a reason for hiding this comment

ysiraichi May 16, 2024

Choose a reason for hiding this comment