Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't copy the batch when training on a single gpu #1576

Merged
merged 2 commits into from
Apr 23, 2020

Conversation

karlinjf
Copy link
Contributor

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #1566 by removing the copy.copy(batch) when moving a batch onto the gpu. This way if there are multiple optimizers, the batch will only need to be moved to the gpu once.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team April 23, 2020 15:31
# Don't copy the batch since there is a single gpu that the batch could
# be referenced from and if there are multiple optimizers the batch will
# wind up copying it to the same device repeatedly.
batch = self.transfer_batch_to_gpu(batch, gpu_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens with multiple GPUs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the "self.single_gpu" condition, so I don't think it's possible to have multiple gpus here? But I don't know this code very well at all.

@mergify mergify bot requested a review from a team April 23, 2020 15:47
@williamFalcon williamFalcon merged commit 41b6cbb into Lightning-AI:master Apr 23, 2020
@karlinjf karlinjf deleted the bugfix/1566_nocopy_batch branch April 23, 2020 20:05
@Borda Borda added the bug Something isn't working label Apr 23, 2020
@Borda Borda added this to the 0.7.4 milestone Apr 23, 2020
@Borda Borda modified the milestones: 0.7.4, v0.7.x Apr 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Batch being moved to gpu repeatedly with multiple optimizers and single gpu training
4 participants