Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in returning Dict from training_step with multiple GPUs #6193

Closed
kchuang625 opened this issue Feb 25, 2021 · 1 comment · Fixed by #6324
Closed

Error in returning Dict from training_step with multiple GPUs #6193

kchuang625 opened this issue Feb 25, 2021 · 1 comment · Fixed by #6324
Assignees
Labels
bug Something isn't working distributed Generic distributed-related topic good first issue Good for newcomers help wanted Open to be worked on priority: 0 High priority task

Comments

@kchuang625
Copy link

kchuang625 commented Feb 25, 2021

🐛 Bug

When using multiple GPUs with 'dp', the error RuntimeError: grad can be implicitly created only for scalar outputs occurs if I utilized training_step function like this:

def training_step(self, batch, batch_idx):
    ...
    return {'loss': loss}

Please reproduce using the BoringModel

https://colab.research.google.com/drive/1hmHqYHPOqDlZUAF7-9zcCvobrvSPt7W5?usp=sharing

Expected behavior

It is supposed to work fine to return Dict with loss key.

A quick solution

Return loss tensor directly from training_step function:

def training_step(self, batch, batch_idx):
    ...
    return loss

Environment

  • PyTorchLightning Version: 1.2.0
  • PyTorch Version: 1.7.0
  • OS: Linux
  • Python version: 3.8
  • CUDA/cuDNN version: 10.2

cc. @carmocca

@kchuang625 kchuang625 added bug Something isn't working help wanted Open to be worked on labels Feb 25, 2021
@SeanNaren SeanNaren added the distributed Generic distributed-related topic label Feb 25, 2021
@carmocca carmocca added the priority: 0 High priority task label Feb 27, 2021
@edenlightning
Copy link
Contributor

Need to support reducing returned dict and not just tensors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working distributed Generic distributed-related topic good first issue Good for newcomers help wanted Open to be worked on priority: 0 High priority task
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants