Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Mechanism to dereference automatically in the Loops #9385

Closed
carmocca opened this issue Sep 9, 2021 · 16 comments
Closed

[RFC] Mechanism to dereference automatically in the Loops #9385

carmocca opened this issue Sep 9, 2021 · 16 comments
Labels
design Includes a design discussion feature Is an improvement or enhancement let's do it! approved to implement loops Related to the Loop API
Milestone

Comments

@carmocca
Copy link
Contributor

carmocca commented Sep 9, 2021

Proposed refactoring or deprecation

The Loops run design

https://github.com/PyTorchLightning/pytorch-lightning/blob/41ba639859cf6c6bf319eb33e5b3394504315962/pytorch_lightning/loops/base.py#L94-L120

doesn't include a mechanism to share data between hooks during the same iteration or between iterations. This forces to write everything to self which is flexible but opens the door to easily forgetting to clear state as the variables will not get garbage collected.

Motivation

#9386 is the perfect example as it shows that self.dataloader_iter was added but the reference was not freed. It gets defined in on_run_start but we only use it in a later hook.

This pattern is also seen in other places:

pytorch_lightning/loops/optimizer/optimizer_loop.py:89:        outputs, self.outputs = self.outputs, []  # free memory
pytorch_lightning/loops/epoch/evaluation_epoch_loop.py:136:        # free memory
pytorch_lightning/loops/epoch/prediction_epoch_loop.py:104:        # free memory
pytorch_lightning/loops/epoch/training_epoch_loop.py:222:        # free memory
pytorch_lightning/loops/batch/manual.py:82:        output, self._output = self._output, None  # free memory
pytorch_lightning/loops/batch/training_batch_loop.py:94:        self.batch_outputs = None  # free memory
pytorch_lightning/loops/dataloader/evaluation_loop.py:120:        # free memory

Pitch

Automaticaly dereference data at the end of run.

Option 1:

shm: Optional[Any] = None  # shared memory that should get deferenced after `run`
shm = self.on_run_start(*args, shm=shm, **kwargs)

while not self.done:
    try:
        shm = self.on_advance_start(*args, shm=shm **kwargs)
        shm = self.advance(*args, shm=shm, **kwargs)
        shm = self.on_advance_end(shm=shm)
        self.restarting = False
    except StopIteration:
        break

output = self.on_run_end(shm=shm)
return output

Option 2:

class Loop(ABC):
    def __init__(self):
        self.shm = object()

    def run():
        ...
        self.on_run_start(*args, **kwargs)

        while not self.done:
            try:
                self.on_advance_start(*args, **kwargs)
                self.advance(*args, **kwargs)
                self.on_advance_end()
                self.restarting = False
            except StopIteration:
                break

        output = self.on_run_end()
        self.shm = object()  # free memory
        return output

where loop writers save the temporal state with self.shm.dataloader_iter = ...


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

  • Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

  • Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @Borda @tchaton @justusschock @awaelchli @carmocca @ananthsub @ninginthecloud @rohitgr7 @akihironitta

@carmocca carmocca added feature Is an improvement or enhancement refactor design Includes a design discussion labels Sep 9, 2021
@tchaton
Copy link
Contributor

tchaton commented Sep 9, 2021

Great idea !

I really like the option 2. Seems to cover all possible hooks and the cleaning is handled automatically.

Should we have 2 sharded object ?

sm_run ? Created and cleaned for the entire run ?
sm_advance ? Created and cleaned for the advance hooks ?

@carmocca
Copy link
Contributor Author

carmocca commented Sep 9, 2021

Should we have 2 sharded object?

We can start implementing and see if having two would be particularly helpful.

@cowwoc
Copy link
Contributor

cowwoc commented Sep 9, 2021

@tchaton I don't think this approach alone would be sufficient to address #9390

Specifically, since each DataLoader uses a different set of worker processes there is no guarantee that the old set of processes will release memory before the new set allocate new memory. This is a big deal because in my case each set of worker processes reserve 20GB of memory. For other users it might be even larger.

I am looking for an explicit call to https://github.com/pytorch/pytorch/blob/e5ab0d1013072c26586b369536bccac648843958/torch/utils/data/dataloader.py#L1328 to resolve this.

@carmocca
Copy link
Contributor Author

carmocca commented Sep 9, 2021

@cowwoc I wouldn't want to call a private DataLoader function like that internally. However, it is our responsibility to properly dereference everything so it can be garbage collected. In your case, you could override a hook and call gc.collect() manually to be sure it gets cleaned before you continue your program.

@cowwoc
Copy link
Contributor

cowwoc commented Sep 9, 2021

@carmocca Sorry, I'm new to Python so maybe I misunderstood. I thought that the presence of __del__ meant that you could invoke del dataloader_iterator and it would shut down the workers. Is that not the case? When is __del__ invoked?

Should I ask the pytorch guys to add a public method to close/dispose a DataLoader?

@carmocca
Copy link
Contributor Author

carmocca commented Sep 9, 2021

__del__ gets invoked after manually calling del or when the garbage collector collects it (can be done manually with gc.collect())
But it's generally ugly to call del manually on objects as Python can figure out when to clean things, same for gc.collect()

Should I ask the pytorch guys to add a public method to close/dispose a DataLoader?

Sure, why not :)

@luiscape
Copy link
Contributor

luiscape commented Sep 9, 2021

__del__ gets invoked after manually calling del

If I'm not wrong, del only removes a reference to an object from memory. __del__ is only really called with the gc kicks in (reference).

@cowwoc
Copy link
Contributor

cowwoc commented Sep 9, 2021

But it's generally ugly to call del manually on objects as Python can figure out when to clean things

I usually prefer the approach you mentioned but in this specific case I think it makes sense to explicitly del the iterator.

It's possible for the main process to have little garbage (so GC will never get triggered) in spite of the idle sub-processes hogging memory. As far as I understand it, garbage held by sub-processes do not apply any pressure on the GC of the main process.

@cowwoc
Copy link
Contributor

cowwoc commented Sep 9, 2021

__del__ gets invoked after manually calling del

If I'm not wrong, del only removes a reference to an object from memory. __del__ is only really called with the gc kicks in (reference).

That link reads:

Note del x doesn’t directly call x.del() — the former decrements the reference count for x by one, and the latter is only called when x’s reference count reaches zero.

But I just tested this locally and what I saw was that the reference count has to drop to zero and the variable has to be assigned to None and the GC has to run. Only then did __del__ get invoked. Omitting any of those 3 steps resulted in __del__ not getting invoked.

I'll ask the DataLoader guys to add an explicit cleanup method.

@cowwoc
Copy link
Contributor

cowwoc commented Sep 9, 2021

Here is my request for an explicit DataLoader cleanup method: pytorch/pytorch#64766

You should probably start with clearing out references for now and if they add a cleanup method we could revisit this issue.

@cowwoc
Copy link
Contributor

cowwoc commented Sep 26, 2021

It looks like the PyTorch-side changes might take a while to arrive. Are you able to release the Lightning changes without it? I am still running out of memory even though I am invoking gc.collect() explicitly. One way or another I will need the Lightning fix.

@cowwoc
Copy link
Contributor

cowwoc commented Sep 26, 2021

I figured out a workaround that seems to work:

  1. Take a snapshot of the sub-processes before each optuna study or trial:
current_process = psutil.Process()
protected_processes = list(map(lambda process: process.pid, current_process.children(recursive=True)))
  1. At the end of each trial, run:
current_process = psutil.Process()
children = current_process.children(recursive=True)
for child in children:
    if child.pid not in protected_processes:
        child.kill()

within a try-finally block to ensure that TrialPruned exceptions are handled as well.

I hope this helps others.

@cowwoc
Copy link
Contributor

cowwoc commented Sep 30, 2021

Per pytorch/pytorch#64766 (comment) and pytorch/pytorch#64766 (comment) they do not plan to add an explicit shutdown API in the near future. I'll keep on eye on this ticket and try gc.collect() once your changes are merged.

@tchaton tchaton removed their assignment Nov 18, 2021
@stale
Copy link

stale bot commented Dec 20, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Dec 20, 2021
@carmocca carmocca added loops Related to the Loop API and removed won't fix This will not be worked on refactor labels Dec 21, 2021
@stale
Copy link

stale bot commented Jan 21, 2022

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Jan 21, 2022
@awaelchli awaelchli added this to the 1.6 milestone Jan 21, 2022
@stale stale bot removed the won't fix This will not be worked on label Jan 21, 2022
@carmocca
Copy link
Contributor Author

carmocca commented Feb 1, 2022

Closing as this is a cool solution but not a natural solution for most people. It also limits the flexibility in memory management.

@carmocca carmocca closed this as completed Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Includes a design discussion feature Is an improvement or enhancement let's do it! approved to implement loops Related to the Loop API
Projects
None yet
Development

No branches or pull requests

5 participants