[benchmarks] PyTorch HUD runs on different data-types. #6483

ysiraichi · 2024-02-06T18:42:06Z

Problem

Currently, we only change the data-type of the models on CUDA, and only if they have DEFAULT_CUDA_<test>_PRECISION specified. Otherwise, the models run on float32 precision. Meanwhile, PyTorch HUD runs inference on bfloat16 and training on AMP.

The data-type of the models is relevant not only for performance, but also for coverage. That is because depending on the data-type of the model, the amount of memory it uses differ.

Possible Solutions

In order to better compare the results on PyTorch HUD and the results we get from running the scripts that live in PyTorch/XLA repository, I think there are a couple of options:

Introduce --bfloat16 and --amp arguments for forcing the models to be of a specific precision
Introduce --pytorch-hud argument for the behavior described above
Make PyTorch HUD used data-types be the default, and add --default-data-type for the old behavior

cc @miladm @golechwierowicz @cota @frgossen @zpcore @vanbasten23

The text was updated successfully, but these errors were encountered:

lezcano · 2024-02-08T17:56:16Z

I think defaulting to the PyTorch HUD behaviour and moving the current behaviour under a flag makes sense. cc @miladm

lezcano · 2024-02-08T17:56:52Z

Perhaps we may not even want to keep the previous behaviour. I'll defer to google engs to decide this.

frgossen · 2024-02-09T14:37:41Z

Agreed, resembling PyTorch HUD behaviour as closely as possible makes sense.

ManfeiBai · 2024-02-09T23:33:22Z

Hi, @lsy323, is it ok to assign this ticket to you?

ysiraichi · 2024-02-09T23:47:49Z

@ManfeiBai @lsy323 Ah, sorry. I should have mentioned it. I'm working on this one.

ysiraichi added the xla:gpu label Feb 6, 2024

ManfeiBai assigned lsy323 Feb 9, 2024

ysiraichi assigned ysiraichi and unassigned lsy323 Feb 9, 2024

This was referenced Feb 10, 2024

[benchmarks] Default to bfloat16 (inference) and AMP (training) precision. #6518

Merged

Failing Torchbench Models: tracking issue #5932

Open

ysiraichi closed this as completed in #6518 Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmarks] PyTorch HUD runs on different data-types. #6483

[benchmarks] PyTorch HUD runs on different data-types. #6483

ysiraichi commented Feb 6, 2024

lezcano commented Feb 8, 2024

lezcano commented Feb 8, 2024

frgossen commented Feb 9, 2024

ManfeiBai commented Feb 9, 2024

ysiraichi commented Feb 9, 2024

[benchmarks] PyTorch HUD runs on different data-types. #6483

[benchmarks] PyTorch HUD runs on different data-types. #6483

Comments

ysiraichi commented Feb 6, 2024

Problem

Possible Solutions

lezcano commented Feb 8, 2024

lezcano commented Feb 8, 2024

frgossen commented Feb 9, 2024

ManfeiBai commented Feb 9, 2024

ysiraichi commented Feb 9, 2024