Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'float' object has no attribute 'clone' - when logging float values with sync_dist=True and ddp #5003

Closed
stachu86 opened this issue Dec 7, 2020 · 5 comments · Fixed by #5080
Assignees
Labels
bug Something isn't working help wanted Open to be worked on logging Related to the `LoggerConnector` and `log()`
Milestone

Comments

@stachu86
Copy link

stachu86 commented Dec 7, 2020

🐛 Bug

self.log('d_acc_dummy' , 100.0, on_step=True, on_epoch=True, sync_dist=True) 

results in

/usr/local/lib/python3.6/dist-packages/pytorch_lightning/core/step_result.py in log(self, name, value, prog_bar, logger, on_step, on_epoch, reduce_fx, tbptt_reduce_fx, tbptt_pad_token, enable_graph, sync_dist, sync_dist_op, sync_dist_group, sync_fn)
    138             is_dist_initialized = torch.distributed.is_available() and torch.distributed.is_initialized()
    139             # TODO: Find a way to make the reduction only once, so we don't need to clone.
--> 140             value = value.clone() if is_dist_initialized else value
    141             value = sync_fn(value, group=sync_dist_group, reduce_op=sync_dist_op)
    142 

AttributeError: 'float' object has no attribute 'clone'

Float is allowed type (from step_result.py)

 if sync_dist and isinstance(value, (torch.Tensor, numbers.Number)):
            is_dist_initialized = torch.distributed.is_available() and torch.distributed.is_initialized()
            # TODO: Find a way to make the reduction only once, so we don't need to clone.
            value = value.clone() if is_dist_initialized else value
            value = sync_fn(value, group=sync_dist_group, reduce_op=sync_dist_op)

Please reproduce using [the BoringModel and post here]

Simple gan example

To Reproduce

Expected behavior

Logging of float values

Environment

  • CUDA:
    • GPU:
      • Tesla T4
    • available: True
    • version: 10.1
  • Packages:
    • numpy: 1.18.5
    • pyTorch_debug: True
    • pyTorch_version: 1.7.0+cu101
    • pytorch-lightning: 1.0.8
    • tqdm: 4.41.1
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.6.9
    • version: Proposal for help #1 SMP Thu Jul 23 08:00:38 PDT 2020

Additional context

@stachu86 stachu86 added bug Something isn't working help wanted Open to be worked on labels Dec 7, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2020

Hi! thanks for your contribution!, great first issue!

@Borda Borda added the Metrics label Dec 7, 2020
@Borda
Copy link
Member

Borda commented Dec 7, 2020

I think you shall be using Tensor, @SkafteNicki?

@Borda Borda added this to the 1.1.x milestone Dec 7, 2020
@SkafteNicki
Copy link
Member

@Borda not metric package related, logging related.
But I do agree that manually casting the number to tensor should get it working for now, but we should probably do this internally.

@SkafteNicki SkafteNicki added logging Related to the `LoggerConnector` and `log()` and removed Metrics labels Dec 7, 2020
@awaelchli
Copy link
Contributor

value = torch.tensor(value, device=.... dtype=...)
value = value.clone()

should be sufficient, but perhaps the conversion can be done earlier and cloning should not be necessary in case value was a float. @stachu86 are you interested in working on this?

@stachu86
Copy link
Author

stachu86 commented Dec 9, 2020

@awaelchli yes, it looks to me like a good first contribution. I was little discouraged by the #TODO comment that suggests somebody will rewrite this soon, but from my understanding clone() is necessary here because torch.distributed.all_reduce operates in-place. However, isn't value.detach().clone() would be more appropriate in the logging context?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on logging Related to the `LoggerConnector` and `log()`
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants