Add swap_tensors path to nn.Module._apply #117167

mikaylagawarecki · 2024-01-10T22:04:48Z

Added torch.__future__.{get/set}_swap_module_params_on_conversion that defaults to False for now, but we probably want to modify to override this and default to True in nn.Module._apply if input is a tensor subclass.

From offline discussion, for now we are not allowing swap_tensor after the first module forward has been run*** if the autograd graph is still alive. The reason being that torch.utils.swap_tensors(t1, t2) requires the use_count of both TensorImpls associated with t1 and t2 to be 1. The first forward pass will install AccumulateGrad nodes on each param, which bump the refcount of the associated TensorImpl. Future work might be to swap the refs that the AccumulateGrad nodes hold if it is necessary.

***From this, it might seem like we don't need to handle gradients. However, I still handle the grads for the edge case that the grads are set via p.grad = grad OR the autograd graph is no longer alive because the output has been garbage collected.

If any swap_tensors fails on any of the parameters in the nn.Module we raise an error.

RNNBase overrides nn.Module._apply() and installs weakrefs on some parameters. As a result, all modules that inherit from RNNBase (RNN, GRU and LSTM) cannot use theswap_tensors path as of now

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2024-01-10T22:04:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/117167

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b16a4af with merge base d444a3b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Towards fixing #115792 [ghstack-poisoned]

ghstack-source-id: 7c5bfa9cc07c847e00765ac843347f4ba392e7fa Pull Request resolved: #117167

albanD

I don't think that falling back to .data= is really an option here. They are doing semantically different things and so we should only switch between the two when the user explicitly asks for it.

Towards fixing #115792 Added `torch.__future__.{get/set}_swap_module_params_on_conversion` that defaults to `False` for now, but we probably want to modify `nn.Module._apply. compute_should_use_swap_tensors` to override this if input is on XLA or is a tensor subclass. From offline discussion, for now we are **not** allowing `swap_tensor` after the first module forward has been run***. The reason being that `torch.utils.swap_tensors(t1, t2)` requires the `use_count` of both `TensorImpl`s associated with `t1` and `t2` to be 1. The first forward pass will install `AccumulateGrad` nodes on each param, which [bump the refcount of the associated TensorImpl](https://github.com/pytorch/pytorch/blob/6cf1fc66e340132d7e2ed9d42efea42fa7ea0183/torch/csrc/autograd/variable.cpp?fbclid=IwAR2dWDVPoXfWF0QDXhhwJ3U7CIAUcNBCAxptlTX9yDI-0pi_h0FBNsw0ig0#L307). Future work might be to swap `AccumulateGrad` nodes if it is necessary. ***From this, it might seem like we don't need to handle gradients. However, I still handle the grads for the edge case that the grads are set via `p.grad = grad`. ### Question If the future is set to allow swapping, there are still cases where swapping might not occur (`use_count > 1`, `weak_use_count > 1` (`weak_use_count` is `(use_count >= 1) + num_weak_refs`). I am wondering what we should do in such cases: 1) error loudly: - Pro: for use cases where `swap_tensors` is necessary for correctness (`XLATensor`, `DTensor`), this will make it very apparent that things have gone wrong - Con: For other use cases where `.data` setting is not semantically correct perhaps this might not preserve BC, especially if we flip the default (before weakrefs to parameters were ok, now not anymore) 2) Warn and fall back to `.data` setting: I was thinking to warn with a list of param_names that were not swapped (since only some might have weakrefs, use_count > 1 etc.) - Pro: Doesn't break BC for the common case, provides signal for how to fix for the case where `._apply` is currently broken - Con: warning might be spammy, especially if there are a lot of parameters 3) silently fall back to `.data` setting - Pro: Not spammy - Con: V hard to debug correctness Right now I haven't chosen which to implement yet, so the fallback is just silent.I am wondering which of (1) or (2) (or something else) might be the better solution? [ghstack-poisoned]

Towards fixing #115792 Added `torch.nn.utils.{get/set}_swap_module_params_on_conversion` that defaults to `False` for now, but we probably want to modify to override this and default to `True` in `nn.Module._apply` if input is on XLA or is a tensor subclass. From offline discussion, for now we are **not** allowing `swap_tensor` after the first module forward has been run***. The reason being that `torch.utils.swap_tensors(t1, t2)` requires the `use_count` of both `TensorImpl`s associated with `t1` and `t2` to be 1. The first forward pass will install `AccumulateGrad` nodes on each param, which [bump the refcount of the associated TensorImpl](https://github.com/pytorch/pytorch/blob/6cf1fc66e340132d7e2ed9d42efea42fa7ea0183/torch/csrc/autograd/variable.cpp?fbclid=IwAR2dWDVPoXfWF0QDXhhwJ3U7CIAUcNBCAxptlTX9yDI-0pi_h0FBNsw0ig0#L307). **Future work might be to swap `AccumulateGrad` nodes if it is necessary.** ***From this, it might seem like we don't need to handle gradients. However, I still handle the grads for the edge case that the grads are set via `p.grad = grad`. If any `swap_tensors` fails on any of the parameters in the `nn.Module` we raise an error. **`RNNBase` overrides `nn.Module._apply()` and installs weakrefs on some parameters. As a result, all modules that inherit from `RNNBase` (`RNN`, `GRU` and `LSTM`) cannot use the`swap_tensors` path as of now** [ghstack-poisoned]

ghstack-source-id: b6f4ea5b09cd5133afa20d1ad2840fbb6658c83c Pull Request resolved: #117167

Towards fixing #115792 Added `torch.nn.utils.{get/set}_swap_module_params_on_conversion` that defaults to `False` for now, but we probably want to modify to override this and default to `True` in `nn.Module._apply` if input is on XLA or is a tensor subclass. From offline discussion, for now we are **not** allowing `swap_tensor` after the first module forward has been run***. The reason being that `torch.utils.swap_tensors(t1, t2)` requires the `use_count` of both `TensorImpl`s associated with `t1` and `t2` to be 1. The first forward pass will install `AccumulateGrad` nodes on each param, which [bump the refcount of the associated TensorImpl](https://github.com/pytorch/pytorch/blob/6cf1fc66e340132d7e2ed9d42efea42fa7ea0183/torch/csrc/autograd/variable.cpp?fbclid=IwAR2dWDVPoXfWF0QDXhhwJ3U7CIAUcNBCAxptlTX9yDI-0pi_h0FBNsw0ig0#L307). **Future work might be to swap `AccumulateGrad` nodes if it is necessary.** ***From this, it might seem like we don't need to handle gradients. However, I still handle the grads for the edge case that the grads are set via `p.grad = grad`. If any `swap_tensors` fails on any of the parameters in the `nn.Module` we raise an error. **`RNNBase` overrides `nn.Module._apply()` and installs weakrefs on some parameters. As a result, all modules that inherit from `RNNBase` (`RNN`, `GRU` and `LSTM`) cannot use the`swap_tensors` path as of now** [ghstack-poisoned]

mikaylagawarecki · 2024-01-22T18:32:13Z

@pytorchbot rebase -s

Towards fixing #115792 Added `torch.nn.utils.{get/set}_swap_module_params_on_conversion` that defaults to `False` for now, but we probably want to modify to override this and default to `True` in `nn.Module._apply` if input is on XLA or is a tensor subclass. From offline discussion, for now we are **not** allowing `swap_tensor` after the first module forward has been run***. The reason being that `torch.utils.swap_tensors(t1, t2)` requires the `use_count` of both `TensorImpl`s associated with `t1` and `t2` to be 1. The first forward pass will install `AccumulateGrad` nodes on each param, which [bump the refcount of the associated TensorImpl](https://github.com/pytorch/pytorch/blob/6cf1fc66e340132d7e2ed9d42efea42fa7ea0183/torch/csrc/autograd/variable.cpp?fbclid=IwAR2dWDVPoXfWF0QDXhhwJ3U7CIAUcNBCAxptlTX9yDI-0pi_h0FBNsw0ig0#L307). **Future work might be to swap `AccumulateGrad` nodes if it is necessary.** ***From this, it might seem like we don't need to handle gradients. However, I still handle the grads for the edge case that the grads are set via `p.grad = grad`. If any `swap_tensors` fails on any of the parameters in the `nn.Module` we raise an error. **`RNNBase` overrides `nn.Module._apply()` and installs weakrefs on some parameters. As a result, all modules that inherit from `RNNBase` (`RNN`, `GRU` and `LSTM`) cannot use the`swap_tensors` path as of now** [ghstack-poisoned]

ysiraichi · 2024-01-31T20:32:35Z

@mikaylagawarecki Hi, there. Just wanted to give a heads-up that I opened #118783, a complementary PR which also fixes #115792. While it fixes the same issue, it does so because of another latent problem with FunctionalTensorWrapper. Check this comment to read more about it. That said, I believe this PR is still relevant, since it also addresses other issues.

.lintrunner.toml

docs/source/conf.py

docs/source/nn.rst

test/test_modules.py

torch/nn/modules/module.py

albanD · 2024-02-02T21:30:36Z

torch/nn/modules/module.py

                else:
-                    assert param.grad.is_leaf
-                    out_param.grad = grad_applied.requires_grad_(param.grad.requires_grad)
+                    assert param_grad.is_leaf


That is weird?

This is copied from existing code, I wasn't sure on the rationale for this either 🤔

torch/nn/utils/__init__.py

Towards fixing #115792 Added `torch.nn.utils.{get/set}_swap_module_params_on_conversion` that defaults to `False` for now, but we probably want to modify to override this and default to `True` in `nn.Module._apply` if input is on XLA or is a tensor subclass. From offline discussion, for now we are **not** allowing `swap_tensor` after the first module forward has been run***. The reason being that `torch.utils.swap_tensors(t1, t2)` requires the `use_count` of both `TensorImpl`s associated with `t1` and `t2` to be 1. The first forward pass will install `AccumulateGrad` nodes on each param, which [bump the refcount of the associated TensorImpl](https://github.com/pytorch/pytorch/blob/6cf1fc66e340132d7e2ed9d42efea42fa7ea0183/torch/csrc/autograd/variable.cpp?fbclid=IwAR2dWDVPoXfWF0QDXhhwJ3U7CIAUcNBCAxptlTX9yDI-0pi_h0FBNsw0ig0#L307). **Future work might be to swap the refs that the `AccumulateGrad` nodes hold if it is necessary.** ***From this, it might seem like we don't need to handle gradients. However, I still handle the grads for the edge case that the grads are set via `p.grad = grad` OR the autograd graph is no longer alive because the output has been garbage collected. If any `swap_tensors` fails on any of the parameters in the `nn.Module` we raise an error. **`RNNBase` overrides `nn.Module._apply()` and installs weakrefs on some parameters. As a result, all modules that inherit from `RNNBase` (`RNN`, `GRU` and `LSTM`) cannot use the`swap_tensors` path as of now** [ghstack-poisoned]

Towards fixing #115792 Added `torch.__future__.{get/set}_swap_module_params_on_conversion` that defaults to `False` for now, but we probably want to modify to override this and default to `True` in `nn.Module._apply` if input is a tensor subclass. From offline discussion, for now we are **not** allowing `swap_tensor` after the first module forward has been run*** if the autograd graph is still alive. The reason being that `torch.utils.swap_tensors(t1, t2)` requires the `use_count` of both `TensorImpl`s associated with `t1` and `t2` to be 1. The first forward pass will install `AccumulateGrad` nodes on each param, which [bump the refcount of the associated TensorImpl](https://github.com/pytorch/pytorch/blob/6cf1fc66e340132d7e2ed9d42efea42fa7ea0183/torch/csrc/autograd/variable.cpp?fbclid=IwAR2dWDVPoXfWF0QDXhhwJ3U7CIAUcNBCAxptlTX9yDI-0pi_h0FBNsw0ig0#L307). **Future work might be to swap the refs that the `AccumulateGrad` nodes hold if it is necessary.** ***From this, it might seem like we don't need to handle gradients. However, I still handle the grads for the edge case that the grads are set via `p.grad = grad` OR the autograd graph is no longer alive because the output has been garbage collected. If any `swap_tensors` fails on any of the parameters in the `nn.Module` we raise an error. **`RNNBase` overrides `nn.Module._apply()` and installs weakrefs on some parameters. As a result, all modules that inherit from `RNNBase` (`RNN`, `GRU` and `LSTM`) cannot use the`swap_tensors` path as of now** [ghstack-poisoned]

torch/__future__.py

Towards fixing #115792 Added `torch.__future__.{get/set}_swap_module_params_on_conversion` that defaults to `False` for now, but we probably want to modify to override this and default to `True` in `nn.Module._apply` if input is a tensor subclass. From offline discussion, for now we are **not** allowing `swap_tensor` after the first module forward has been run*** if the autograd graph is still alive. The reason being that `torch.utils.swap_tensors(t1, t2)` requires the `use_count` of both `TensorImpl`s associated with `t1` and `t2` to be 1. The first forward pass will install `AccumulateGrad` nodes on each param, which [bump the refcount of the associated TensorImpl](https://github.com/pytorch/pytorch/blob/6cf1fc66e340132d7e2ed9d42efea42fa7ea0183/torch/csrc/autograd/variable.cpp?fbclid=IwAR2dWDVPoXfWF0QDXhhwJ3U7CIAUcNBCAxptlTX9yDI-0pi_h0FBNsw0ig0#L307). **Future work might be to swap the refs that the `AccumulateGrad` nodes hold if it is necessary.** ***From this, it might seem like we don't need to handle gradients. However, I still handle the grads for the edge case that the grads are set via `p.grad = grad` OR the autograd graph is no longer alive because the output has been garbage collected. If any `swap_tensors` fails on any of the parameters in the `nn.Module` we raise an error. **`RNNBase` overrides `nn.Module._apply()` and installs weakrefs on some parameters. As a result, all modules that inherit from `RNNBase` (`RNN`, `GRU` and `LSTM`) cannot use the`swap_tensors` path as of now** [ghstack-poisoned]

albanD

Thanks!

…_on_conversion (#118023) For above PR to parametrize existing `load_state_dict` tests Pull Request resolved: #118023 Approved by: https://github.com/albanD ghstack dependencies: #118028, #117167

Added `torch.__future__.{get/set}_swap_module_params_on_conversion` that defaults to `False` for now, but we probably want to modify to override this and default to `True` in `nn.Module._apply` if input is a tensor subclass. From offline discussion, for now we are **not** allowing `swap_tensor` after the first module forward has been run*** if the autograd graph is still alive. The reason being that `torch.utils.swap_tensors(t1, t2)` requires the `use_count` of both `TensorImpl`s associated with `t1` and `t2` to be 1. The first forward pass will install `AccumulateGrad` nodes on each param, which [bump the refcount of the associated TensorImpl](https://github.com/pytorch/pytorch/blob/6cf1fc66e340132d7e2ed9d42efea42fa7ea0183/torch/csrc/autograd/variable.cpp?fbclid=IwAR2dWDVPoXfWF0QDXhhwJ3U7CIAUcNBCAxptlTX9yDI-0pi_h0FBNsw0ig0#L307). **Future work might be to swap the refs that the `AccumulateGrad` nodes hold if it is necessary.** ***From this, it might seem like we don't need to handle gradients. However, I still handle the grads for the edge case that the grads are set via `p.grad = grad` OR the autograd graph is no longer alive because the output has been garbage collected. If any `swap_tensors` fails on any of the parameters in the `nn.Module` we raise an error. **`RNNBase` overrides `nn.Module._apply()` and installs weakrefs on some parameters. As a result, all modules that inherit from `RNNBase` (`RNN`, `GRU` and `LSTM`) cannot use the`swap_tensors` path as of now** Pull Request resolved: #117167 Approved by: https://github.com/albanD ghstack dependencies: #118028

…_on_conversion (#118023) For above PR to parametrize existing `load_state_dict` tests Pull Request resolved: #118023 Approved by: https://github.com/albanD ghstack dependencies: #118028, #117167

Added `torch.__future__.{get/set}_swap_module_params_on_conversion` that defaults to `False` for now, but we probably want to modify to override this and default to `True` in `nn.Module._apply` if input is a tensor subclass. From offline discussion, for now we are **not** allowing `swap_tensor` after the first module forward has been run*** if the autograd graph is still alive. The reason being that `torch.utils.swap_tensors(t1, t2)` requires the `use_count` of both `TensorImpl`s associated with `t1` and `t2` to be 1. The first forward pass will install `AccumulateGrad` nodes on each param, which [bump the refcount of the associated TensorImpl](https://github.com/pytorch/pytorch/blob/6cf1fc66e340132d7e2ed9d42efea42fa7ea0183/torch/csrc/autograd/variable.cpp?fbclid=IwAR2dWDVPoXfWF0QDXhhwJ3U7CIAUcNBCAxptlTX9yDI-0pi_h0FBNsw0ig0#L307). **Future work might be to swap the refs that the `AccumulateGrad` nodes hold if it is necessary.** ***From this, it might seem like we don't need to handle gradients. However, I still handle the grads for the edge case that the grads are set via `p.grad = grad` OR the autograd graph is no longer alive because the output has been garbage collected. If any `swap_tensors` fails on any of the parameters in the `nn.Module` we raise an error. **`RNNBase` overrides `nn.Module._apply()` and installs weakrefs on some parameters. As a result, all modules that inherit from `RNNBase` (`RNN`, `GRU` and `LSTM`) cannot use the`swap_tensors` path as of now** Pull Request resolved: #117167 Approved by: https://github.com/albanD ghstack dependencies: #118028

…_on_conversion (#118023) For above PR to parametrize existing `load_state_dict` tests Pull Request resolved: #118023 Approved by: https://github.com/albanD ghstack dependencies: #118028, #117167

Add swap_tensors path to nn.Module._apply

15d0748

[ghstack-poisoned]

This was referenced Jan 10, 2024

Fix swap_tensors to swap PyObjects associated with TensorImpl #116955

Closed

Add utility to get use_count of the TensorImpl a python tensor refers to #117166

Closed

mikaylagawarecki added release notes: nn release notes category topic: new features topic category labels Jan 10, 2024

mikaylagawarecki changed the title ~~Add swap_tensors path to nn.Module._apply~~ [WIP] Add swap_tensors path to nn.Module._apply Jan 10, 2024

Update on "[WIP] Add swap_tensors path to nn.Module._apply"

8fa2a41

Towards fixing #115792 [ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Jan 11, 2024

Add swap_tensors path to nn.Module._apply

8743a25

ghstack-source-id: 7c5bfa9cc07c847e00765ac843347f4ba392e7fa Pull Request resolved: #117167

mikaylagawarecki requested a review from albanD January 11, 2024 00:33

albanD reviewed Jan 11, 2024

View reviewed changes

mikaylagawarecki added ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end labels Jan 19, 2024

mikaylagawarecki added a commit that referenced this pull request Jan 19, 2024

Add swap_tensors path to nn.Module._apply

544cd69

ghstack-source-id: b6f4ea5b09cd5133afa20d1ad2840fbb6658c83c Pull Request resolved: #117167

mikaylagawarecki mentioned this pull request Jan 20, 2024

Integrate swap_tensors into nn.Module.load_state_dict #117913

Closed

mikaylagawarecki marked this pull request as ready for review January 23, 2024 22:23

mikaylagawarecki requested a review from jbschlosser as a code owner January 23, 2024 22:23

mikaylagawarecki requested a review from albanD January 24, 2024 15:15

mikaylagawarecki changed the title ~~[WIP] Add swap_tensors path to nn.Module._apply~~ Add swap_tensors path to nn.Module._apply Jan 24, 2024

mikaylagawarecki added 2 commits January 25, 2024 14:29

This was referenced Jan 26, 2024

Force to fp16 models to fp32 if XLA_USE_FP16 is already set. pytorch/xla#6389

Merged

Tracking issue: PyTorch precision upcast issue. pytorch/xla#6404

Open

mikaylagawarecki mentioned this pull request Jan 31, 2024

[TESTING] Check that using __torch_function__ to override module_load_to/from composes with __torch_dispatch__ subclasses #118800

Closed

lezcano mentioned this pull request Feb 1, 2024

Implement shallow copy functions for FunctionalTensorWrapper. #118783

Closed

albanD reviewed Feb 2, 2024

View reviewed changes

mikaylagawarecki added 2 commits February 5, 2024 12:33

mikaylagawarecki commented Feb 5, 2024

View reviewed changes

torch/__future__.py Show resolved Hide resolved

mikaylagawarecki requested a review from albanD February 5, 2024 20:59

mikaylagawarecki added 2 commits February 5, 2024 13:25

albanD approved these changes Feb 7, 2024

View reviewed changes

pytorchmergebot closed this in d5a718d Feb 7, 2024

pytorchmergebot added the Merged label Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add swap_tensors path to nn.Module._apply #117167

Add swap_tensors path to nn.Module._apply #117167

mikaylagawarecki commented Jan 10, 2024 •

edited

Loading

pytorch-bot bot commented Jan 10, 2024 •

edited

Loading

albanD left a comment

mikaylagawarecki commented Jan 22, 2024

ysiraichi commented Jan 31, 2024

albanD Feb 2, 2024

mikaylagawarecki Feb 5, 2024

albanD left a comment

Add swap_tensors path to nn.Module._apply #117167

Add swap_tensors path to nn.Module._apply #117167

Conversation

mikaylagawarecki commented Jan 10, 2024 • edited Loading

pytorch-bot bot commented Jan 10, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/117167

✅ No Failures

albanD left a comment

Choose a reason for hiding this comment

mikaylagawarecki commented Jan 22, 2024

ysiraichi commented Jan 31, 2024

albanD Feb 2, 2024

Choose a reason for hiding this comment

mikaylagawarecki Feb 5, 2024

Choose a reason for hiding this comment

albanD left a comment

Choose a reason for hiding this comment

mikaylagawarecki commented Jan 10, 2024 •

edited

Loading

pytorch-bot bot commented Jan 10, 2024 •

edited

Loading