Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'BoringModel' object has no attribute 'require_backward_grad_sync' when using manual optimization with TPU #6503

Closed
evanatyourservice opened this issue Mar 12, 2021 · 4 comments · Fixed by #8458
Assignees
Labels
accelerator: tpu Tensor Processing Unit bug Something isn't working help wanted Open to be worked on priority: 0 High priority task
Milestone

Comments

@evanatyourservice
Copy link

🐛 Bug

Hello!

When using manual optimization with TPU, I am getting an AttributeError: 'BoringModel' object has no attribute 'require_backward_grad_sync'. When I replace self.manual_backward(loss) with loss.backward() things seem to work, but I am not sure if this is a safe or sustainable workaround. It seems the error happens at the self.manual_backward(loss) step in training_step. Any help would be much appreciated.

Please reproduce using the BoringModel

Here is the notebook reproducing the error:

https://colab.research.google.com/drive/1LPYgtUAiHd1OXuTK6I1WkRaCUQScxEPg?usp=sharing

Environment

WARNING:root:TPU has started up successfully with version pytorch-1.8

  • CUDA:
    • GPU:
    • available: False
    • version: 10.1
  • Packages:
    • numpy: 1.19.5
    • pyTorch_debug: False
    • pyTorch_version: 1.8.0+cu101
    • pytorch-lightning: 1.3.0dev
    • tqdm: 4.41.1
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.7.10
    • version: Proposal for help #1 SMP Thu Jul 23 08:00:38 PDT 2020

Installed torch-xla using:
!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.8-cp37-cp37m-linux_x86_64.whl
to match colab defaults

@evanatyourservice evanatyourservice added bug Something isn't working help wanted Open to be worked on labels Mar 12, 2021
@awaelchli awaelchli added the accelerator: tpu Tensor Processing Unit label Mar 14, 2021
@tchaton
Copy link
Contributor

tchaton commented Mar 21, 2021

Dear @evanatyourservice,

Manual Optimization doesn't work with TPU yet. Sorry for the inconvenience.
loss.backward() might work, but I would have to investigate deeper.

Best,
T.C

@tchaton tchaton added the priority: 1 Medium priority task label Mar 21, 2021
@evanatyourservice
Copy link
Author

Ok, no worries. I don't have the time to work on this myself, so I will set up my own TPU notebook.

@SeanNaren
Copy link
Contributor

Hey @evanatyourservice @7970 may have fixed this issue, could you retry using master?

@evanatyourservice
Copy link
Author

The notebook I linked to above doesn't run, it still gets an error:

Exception in device=TPU:4: 'LightningDistributedModule' object has no attribute 'require_backward_grad_sync'

but it's been a while since I made it so I'm not sure if the notebook code is wrong or lightning's.

I ended up making my own TPU notebook so I stopped using lightning but I was trying to implement sharpness aware minimization if anyone wanted to try this with lightning

@edenlightning edenlightning added priority: 0 High priority task and removed priority: 1 Medium priority task labels Jul 1, 2021
@edenlightning edenlightning added this to the v1.3.x milestone Jul 1, 2021
@Borda Borda modified the milestones: v1.3.x, v1.4 Jul 6, 2021
@edenlightning edenlightning modified the milestones: v1.4, v1.3.x Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: tpu Tensor Processing Unit bug Something isn't working help wanted Open to be worked on priority: 0 High priority task
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants