all_gather raises NotImplementedError when no Accelerator defined in Trainer #5181

8greg8 · 2020-12-18T13:12:19Z

🐛 Bug

When no Accelerator is defined in Trainer, all_gather function in LightningModule raises NotImplementedError.

Please reproduce using the BoringModel and post here

https://colab.research.google.com/drive/1VPEIaQ-aN5KVA70VtvGk24AVkTYPvMoY?usp=sharing

To Reproduce

Expected behavior

All gather should return a value instead.

Environment

CUDA:
- GPU:
  - Tesla P100-PCIE-16GB
- available: True
- version: 10.1
Packages:
- numpy: 1.19.4
- pyTorch_debug: True
- pyTorch_version: 1.7.0+cu101
- pytorch-lightning: 1.1.2rc1
- tqdm: 4.41.1
System:
- OS: Linux
- architecture:
  - 64bit
- processor: x86_64
- python: 3.6.9
- version: Proposal for help #1 SMP Thu Jul 23 08:00:38 PDT 2020

Additional context

awaelchli · 2020-12-20T06:13:50Z

Fyi your colab notebook is not public, it can't be accessed :)

8greg8 · 2020-12-21T10:57:27Z

@awaelchli sorry my bad. Corrected and changed the link in bug description. You should be able to access now.

awaelchli · 2021-01-15T04:55:48Z

@tchaton we should provide implementations for all_gather on cpu and single gpu to make code device agnostic. Does that make sense?

tchaton · 2021-03-02T08:07:13Z

Hey @awaelchli,

Yes, we should.

Best,
T.C

tchaton · 2021-03-02T10:46:13Z

Dear @8greg8,

I checked the notebook and couldn't reproduce the bug.

@awaelchli I have checked the code and it seems we are already supporting all_gather on cpu and single gpu.
However, we don't support gradients for TPU.

Best,
T.C

tchaton · 2021-03-02T10:48:46Z

Dear @8greg8,

I checked the notebook and couldn't reproduce the bug.

@awaelchli I have checked the code and it seems we are already supporting all_gather on cpu and single gpu.
However, we don't support gradients for TPU. I opened an issue for it.

Best,
T.C

awaelchli · 2021-03-02T11:27:06Z

Yes because that came automatically with the accelerator refactor and the discussion here is before that was introduced.
Today, Trainer always has an accelerator defined. This issue can be closed if you agree.

8greg8 added bug Something isn't working help wanted Open to be worked on labels Dec 18, 2020

awaelchli added the distributed Generic distributed-related topic label Dec 20, 2020

awaelchli added this to the 1.1.x milestone Dec 20, 2020

tchaton self-assigned this Dec 21, 2020

tchaton mentioned this issue Dec 21, 2020

Bugfix/all gather #5221

Merged

11 tasks

Borda modified the milestones: 1.1.x, 1.2 Feb 8, 2021

edenlightning modified the milestones: 1.2, 1.2.x Feb 8, 2021

awaelchli self-assigned this Feb 25, 2021

awaelchli closed this as completed Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

all_gather raises NotImplementedError when no Accelerator defined in Trainer #5181

all_gather raises NotImplementedError when no Accelerator defined in Trainer #5181

8greg8 commented Dec 18, 2020 •

edited

Loading

awaelchli commented Dec 20, 2020

8greg8 commented Dec 21, 2020

awaelchli commented Jan 15, 2021 •

edited

Loading

tchaton commented Mar 2, 2021

tchaton commented Mar 2, 2021

tchaton commented Mar 2, 2021

awaelchli commented Mar 2, 2021

all_gather raises NotImplementedError when no Accelerator defined in Trainer #5181

all_gather raises NotImplementedError when no Accelerator defined in Trainer #5181

Comments

8greg8 commented Dec 18, 2020 • edited Loading

🐛 Bug

Please reproduce using the BoringModel and post here

To Reproduce

Expected behavior

Environment

Additional context

awaelchli commented Dec 20, 2020

8greg8 commented Dec 21, 2020

awaelchli commented Jan 15, 2021 • edited Loading

tchaton commented Mar 2, 2021

tchaton commented Mar 2, 2021

tchaton commented Mar 2, 2021

awaelchli commented Mar 2, 2021

8greg8 commented Dec 18, 2020 •

edited

Loading

awaelchli commented Jan 15, 2021 •

edited

Loading