-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[benchmarks] PyTorch HUD runs on different data-types. #6483
Comments
I think defaulting to the PyTorch HUD behaviour and moving the current behaviour under a flag makes sense. cc @miladm |
Perhaps we may not even want to keep the previous behaviour. I'll defer to google engs to decide this. |
Agreed, resembling PyTorch HUD behaviour as closely as possible makes sense. |
Hi, @lsy323, is it ok to assign this ticket to you? |
@ManfeiBai @lsy323 Ah, sorry. I should have mentioned it. I'm working on this one. |
Problem
Currently, we only change the data-type of the models on CUDA, and only if they have
DEFAULT_CUDA_<test>_PRECISION
specified. Otherwise, the models run onfloat32
precision. Meanwhile, PyTorch HUD runs inference onbfloat16
and training on AMP.The data-type of the models is relevant not only for performance, but also for coverage. That is because depending on the data-type of the model, the amount of memory it uses differ.
Possible Solutions
In order to better compare the results on PyTorch HUD and the results we get from running the scripts that live in PyTorch/XLA repository, I think there are a couple of options:
--bfloat16
and--amp
arguments for forcing the models to be of a specific precision--pytorch-hud
argument for the behavior described above--default-data-type
for the old behaviorcc @miladm @golechwierowicz @cota @frgossen @zpcore @vanbasten23
The text was updated successfully, but these errors were encountered: