Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA memory leak after batch size finder #6570

Closed
maxjeblick opened this issue Mar 17, 2021 · 3 comments
Closed

CUDA memory leak after batch size finder #6570

maxjeblick opened this issue Mar 17, 2021 · 3 comments
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@maxjeblick
Copy link
Contributor

🐛 Bug

Using transformers + AdamW optimizer + batch size finder results in ~2 - 3 GB GPU memory not being freed after
trainer.tune (for xlm-roberta-base). This causes OOM issues on a subsequent call of trainer.fit.
I suspect that the state of the AdamW optimizer causes this issue.

Please reproduce using the BoringModel

https://colab.research.google.com/drive/1cugaUmLzNvk-38OyV8zyT9M9xQY4LkfH#scrollTo=j4w0wizx5XxJ

Expected behavior

GPU memory should be freed after the batch size finder (up to the model which may stay on GPU).

Environment

  • CUDA:
    • GPU:
      • Tesla T4
    • available: True
    • version: 10.1
  • Packages:
    • numpy: 1.19.5
    • pyTorch_debug: False
    • pyTorch_version: 1.8.0+cu101
    • pytorch-lightning: 1.2.4
    • tqdm: 4.41.1
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.7.10
    • version: Proposal for help #1 SMP Thu Jul 23 08:00:38 PDT 2020
@maxjeblick maxjeblick added bug Something isn't working help wanted Open to be worked on labels Mar 17, 2021
@maxjeblick
Copy link
Contributor Author

Using

    def configure_optimizers(self):
        optimizer = torch.optim.SGD(self.model.parameters(), lr=0.1)
        return optimizer

results in no GPU leakage.

@maxjeblick
Copy link
Contributor Author

maxjeblick commented Mar 17, 2021

trainer._lightning_optimizers (from here) still contains the optimizer which was used for finding the correct batch size (including exp_avg stats on CUDA).

I also noticed that, for some cases, calling model.fit() resulted in a wrong fitting behavior when used together with model.tune() and tuning (batch size only) was done with a random target.

@maxjeblick
Copy link
Contributor Author

Seems to be fixed already by the following PR: #6372 :D
(thanks @awaelchli)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

1 participant