-
Notifications
You must be signed in to change notification settings - Fork 26.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: torch==1.12
will toggle torch.backends.matmul.allow_tf32
to False
- what should we do?
#16588
Comments
You don't have to condition the |
The main reason for the conditional suggestion was to be self-documenting, but w/o the conditional this code will fail in older pytorch, for example:
|
@mruberry shared on slack, that jax has a similar flag jax-ml/jax#6143 should you want to make this behavior consistent across all 3 frameworks and/or to make it configurable. And they too have a default that not appreciated by all who expect fp32 to be fp32: jax-ml/jax#7010 |
torch.backends.matmul.allow_tf32
for torch > 1.11torch==1.12
will toggle torch.backends.matmul.allow_tf32
to False
- what should we do?
Just to understand: I think I'm in favor of not overwriting E.g. if a user does: import torch
torch.backends.matmul.allow_tf32 = False
import transformers
.... Also I think it's a good rule of thumb that in PyTorch by default, always the highest precision, lowest speed is enabled. Think we don't have to or shouldn't care about JAX here really as the default precision / device behavior is already very different (e.g. JAX uses lowest precision on TPU by default, uses GPU/TPU by default in contrast to PyTorch) |
Tensorflow has it active by default and has a flag to control it (docs). I'd say we don't need to touch it in TF, but happy to go with a solution that minimizes PT-TF interface differences. |
This is a very complicated as on the one hand, we don't want to change the PyTorch default and surprise the user, but on the other hand we don't want most of our beginner users to experience degraded performance in training on most GPUs without them knowing why (as this change will be hidden in PyTorch release notes). I'm also in favor of not touching PyTorch's default (the same way we don't turn on things link |
Small point of clarification: we have not changed the default to False at this time, but expect to do so in the future.
Agreed! This is the principal that motivated this change. We will also have user-facing documentation beyond the release notes when this change is part of PyTorch release, because we agree this change has the potential to be surprising and disruptive to current Ampere users. We'll also provide a recommendation for developers when making this change in nightlies. |
I think it was added in pt-1.9, since 1.8 doesn't have this flag. see #16588 (comment) and the plan is to revert to So it has been set to |
I forgot that I added it already when we added bf16 support: transformers/src/transformers/training_args.py Lines 249 to 251 in d57da99
Except it has no default setting, I guess we keep it that way w/o default? |
Please review the current doc and suggest if anything needs to be changed: Thank you! |
Yes, that doc is great. We should also expand a bit the documentation of the flag in |
Thank you for reviewing and the feedback, Sylvain. Here is a PR: #16674 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
FYI pytorch/pytorch#76509 has landed, and while it may not be perfect we think it achieves the goal of giving users device agnostic control over fp32 matmul precision. Please don't hesitate to reach out if you have additional questions, I'll also be producing additional documentation on this change ahead of the PyTorch 1.12 release. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Ampere GPUs added a new mode called TF32. Pytorch created a new flag to support the TF32 mode enabling using
torch.backends.matmul.allow_tf32
which has beenTrue
by default in pytorch since it was added.Having this mode on means that matrix multiplications when inputs were in FP32 were actually done in TF32, which made the math significantly faster, albeit less precise (TF32 has the dynamic range of BF16, and the precision of FP16).
The NVIDIA engineers have done many experiments and have found that Deep Learning training accuracy doesn't get impacted for worse by using TF32 instead of FP32 (and often is better), but it provides a significant speed up. It's easy to see from the A100 spec why:
(numbers with no sparsity)
And the accuracy tables are:
from Accelerating AI Training with NVIDIA TF32 Tensor Cores
However, the lost precision for some non-DL applications is a problem. Therefore starting from pytorch 1.12 (already in nightly shortly) the default for
torch.backends.matmul.allow_tf32
will beFalse
, which won't make the training accuracy worse, but it'll make fp32 training significantly slower. So if you believe we should remain consistent/back compatible - most likely we should turn it back on for pt>1.11:at a single point which always gets executed for pytorch users.
The question is whether this should be done:
Additionally other use-modes should be made in sync:
Currently tf32 and how to flip it on/off is documented here: https://huggingface.co/docs/transformers/performance#tf32
A detailed discussion with multiple links to other related resources is here: https://dev-discuss.pytorch.org/t/pytorch-and-tensorfloat32/504
@LysandreJik, @sgugger, @patrickvonplaten, @patil-suraj
The text was updated successfully, but these errors were encountered: