-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple fix for memory leak on GPU0 #1094
Changes from 5 commits
5c554a1
2f17b2f
6d89505
54e9a5e
83c291d
6cba621
2c7b802
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -640,4 +640,11 @@ def determine_root_gpu_device(gpus): | |
# set root gpu | ||
root_gpu = gpus[0] | ||
|
||
# set cuda device to root gpu | ||
# related to https://github.com/PyTorchLightning/pytorch-lightning/issues/958 | ||
# Refer solution: https://github.com/pytorch/pytorch/issues/9871#issuecomment-408304190 | ||
# root_device = torch.device("cuda", root_gpu) | ||
root_device = (torch.device("cuda", root_gpu) if root_gpu >= 0 else torch.device("cpu")) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would root_device be set if the user wants CPU? None? -1? Maybe we should check for that explicitly There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the user wants CPU, the function There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in this case it is getting called with gpus=None, and returns None (see first lines of determine_root_gpu_device). So your There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but I think the device should be set outside this function anyway There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @awaelchli where do you suggest? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would search the code base for occurrences of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. better to ask the core team on this :) |
||
torch.cuda.set_device(root_device) | ||
|
||
return root_gpu |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -421,9 +421,20 @@ def evaluation_forward(self, model, batch, batch_idx, dataloader_idx, test_mode: | |
# single GPU data transfer | ||
if self.single_gpu: | ||
# for single GPU put inputs on gpu manually | ||
root_gpu = 0 | ||
|
||
if isinstance(self.data_parallel_device_ids, list): | ||
root_gpu = self.data_parallel_device_ids[0] | ||
|
||
# set cuda device to root gpu | ||
# related to https://github.com/PyTorchLightning/pytorch-lightning/issues/958 | ||
# Refer: https://github.com/pytorch/pytorch/issues/9871#issuecomment-408304190 | ||
root_device = (torch.device("cuda", root_gpu) | ||
if root_gpu >= 0 else torch.device("cpu")) | ||
torch.cuda.set_device(root_device) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also need to add tpu device... |
||
else: | ||
raise RuntimeError( | ||
'Expected `data_parallel_device_ids` as a list, cannot determine root gpu.' | ||
) | ||
batch = self.transfer_batch_to_gpu(batch, root_gpu) | ||
args[0] = batch | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this now