AttributeError: 'BoringModel' object has no attribute 'require_backward_grad_sync' when using manual optimization with TPU #6503

evanatyourservice · 2021-03-12T21:48:31Z

🐛 Bug

Hello!

When using manual optimization with TPU, I am getting an AttributeError: 'BoringModel' object has no attribute 'require_backward_grad_sync'. When I replace self.manual_backward(loss) with loss.backward() things seem to work, but I am not sure if this is a safe or sustainable workaround. It seems the error happens at the self.manual_backward(loss) step in training_step. Any help would be much appreciated.

Please reproduce using the BoringModel

Here is the notebook reproducing the error:

https://colab.research.google.com/drive/1LPYgtUAiHd1OXuTK6I1WkRaCUQScxEPg?usp=sharing

Environment

WARNING:root:TPU has started up successfully with version pytorch-1.8

CUDA:
- GPU:
- available: False
- version: 10.1
Packages:
- numpy: 1.19.5
- pyTorch_debug: False
- pyTorch_version: 1.8.0+cu101
- pytorch-lightning: 1.3.0dev
- tqdm: 4.41.1
System:
- OS: Linux
- architecture:
  - 64bit
- processor: x86_64
- python: 3.7.10
- version: Proposal for help #1 SMP Thu Jul 23 08:00:38 PDT 2020

Installed torch-xla using:
!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.8-cp37-cp37m-linux_x86_64.whl
to match colab defaults

The text was updated successfully, but these errors were encountered:

tchaton · 2021-03-21T09:58:30Z

Dear @evanatyourservice,

Manual Optimization doesn't work with TPU yet. Sorry for the inconvenience.
loss.backward() might work, but I would have to investigate deeper.

Best,
T.C

evanatyourservice · 2021-03-23T14:41:17Z

Ok, no worries. I don't have the time to work on this myself, so I will set up my own TPU notebook.

SeanNaren · 2021-06-16T09:31:22Z

Hey @evanatyourservice @7970 may have fixed this issue, could you retry using master?

evanatyourservice · 2021-06-17T16:49:29Z

The notebook I linked to above doesn't run, it still gets an error:

Exception in device=TPU:4: 'LightningDistributedModule' object has no attribute 'require_backward_grad_sync'

but it's been a while since I made it so I'm not sure if the notebook code is wrong or lightning's.

I ended up making my own TPU notebook so I stopped using lightning but I was trying to implement sharpness aware minimization if anyone wanted to try this with lightning

evanatyourservice added bug Something isn't working help wanted Open to be worked on labels Mar 12, 2021

awaelchli added the accelerator: tpu Tensor Processing Unit label Mar 14, 2021

tchaton added the priority: 1 Medium priority task label Mar 21, 2021

edenlightning assigned kaushikb11 Apr 12, 2021

tchaton mentioned this issue Jun 14, 2021

[fix] Enable manual optimization DeepSpeed #7970

Merged

11 tasks

edenlightning added priority: 0 High priority task and removed priority: 1 Medium priority task labels Jul 1, 2021

edenlightning added this to the v1.3.x milestone Jul 1, 2021

Borda modified the milestones: v1.3.x, v1.4 Jul 6, 2021

edenlightning modified the milestones: v1.4, v1.3.x Jul 6, 2021

kaushikb11 mentioned this issue Jul 19, 2021

fix: Enable manual optimization for TPUs #8458

Merged

12 tasks

kaushikb11 closed this as completed in #8458 Jul 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'BoringModel' object has no attribute 'require_backward_grad_sync' when using manual optimization with TPU #6503

AttributeError: 'BoringModel' object has no attribute 'require_backward_grad_sync' when using manual optimization with TPU #6503

evanatyourservice commented Mar 12, 2021

tchaton commented Mar 21, 2021

evanatyourservice commented Mar 23, 2021

SeanNaren commented Jun 16, 2021

evanatyourservice commented Jun 17, 2021

AttributeError: 'BoringModel' object has no attribute 'require_backward_grad_sync' when using manual optimization with TPU #6503

AttributeError: 'BoringModel' object has no attribute 'require_backward_grad_sync' when using manual optimization with TPU #6503

Comments

evanatyourservice commented Mar 12, 2021

🐛 Bug

Please reproduce using the BoringModel

Environment

tchaton commented Mar 21, 2021

evanatyourservice commented Mar 23, 2021

SeanNaren commented Jun 16, 2021

evanatyourservice commented Jun 17, 2021