[RFC] Mechanism to dereference automatically in the Loops #9385

carmocca · 2021-09-09T01:08:52Z

Proposed refactoring or deprecation

The Loops run design

https://github.com/PyTorchLightning/pytorch-lightning/blob/41ba639859cf6c6bf319eb33e5b3394504315962/pytorch_lightning/loops/base.py#L94-L120

doesn't include a mechanism to share data between hooks during the same iteration or between iterations. This forces to write everything to self which is flexible but opens the door to easily forgetting to clear state as the variables will not get garbage collected.

Motivation

#9386 is the perfect example as it shows that self.dataloader_iter was added but the reference was not freed. It gets defined in on_run_start but we only use it in a later hook.

This pattern is also seen in other places:

pytorch_lightning/loops/optimizer/optimizer_loop.py:89:        outputs, self.outputs = self.outputs, []  # free memory
pytorch_lightning/loops/epoch/evaluation_epoch_loop.py:136:        # free memory
pytorch_lightning/loops/epoch/prediction_epoch_loop.py:104:        # free memory
pytorch_lightning/loops/epoch/training_epoch_loop.py:222:        # free memory
pytorch_lightning/loops/batch/manual.py:82:        output, self._output = self._output, None  # free memory
pytorch_lightning/loops/batch/training_batch_loop.py:94:        self.batch_outputs = None  # free memory
pytorch_lightning/loops/dataloader/evaluation_loop.py:120:        # free memory

Pitch

Automaticaly dereference data at the end of run.

Option 1:

shm: Optional[Any] = None  # shared memory that should get deferenced after `run`
shm = self.on_run_start(*args, shm=shm, **kwargs)

while not self.done:
    try:
        shm = self.on_advance_start(*args, shm=shm **kwargs)
        shm = self.advance(*args, shm=shm, **kwargs)
        shm = self.on_advance_end(shm=shm)
        self.restarting = False
    except StopIteration:
        break

output = self.on_run_end(shm=shm)
return output

Option 2:

class Loop(ABC):
    def __init__(self):
        self.shm = object()

    def run():
        ...
        self.on_run_start(*args, **kwargs)

        while not self.done:
            try:
                self.on_advance_start(*args, **kwargs)
                self.advance(*args, **kwargs)
                self.on_advance_end()
                self.restarting = False
            except StopIteration:
                break

        output = self.on_run_end()
        self.shm = object()  # free memory
        return output

where loop writers save the temporal state with self.shm.dataloader_iter = ...

If you enjoy Lightning, check out our other projects! ⚡

_{Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.}

cc @Borda @tchaton @justusschock @awaelchli @carmocca @ananthsub @ninginthecloud @rohitgr7 @akihironitta

The text was updated successfully, but these errors were encountered:

tchaton · 2021-09-09T07:57:09Z

Great idea !

I really like the option 2. Seems to cover all possible hooks and the cleaning is handled automatically.

Should we have 2 sharded object ?

sm_run ? Created and cleaned for the entire run ?
sm_advance ? Created and cleaned for the advance hooks ?

carmocca · 2021-09-09T13:01:52Z

Should we have 2 sharded object?

We can start implementing and see if having two would be particularly helpful.

cowwoc · 2021-09-09T13:37:08Z

@tchaton I don't think this approach alone would be sufficient to address #9390

Specifically, since each DataLoader uses a different set of worker processes there is no guarantee that the old set of processes will release memory before the new set allocate new memory. This is a big deal because in my case each set of worker processes reserve 20GB of memory. For other users it might be even larger.

I am looking for an explicit call to https://github.com/pytorch/pytorch/blob/e5ab0d1013072c26586b369536bccac648843958/torch/utils/data/dataloader.py#L1328 to resolve this.

carmocca · 2021-09-09T14:48:41Z

@cowwoc I wouldn't want to call a private DataLoader function like that internally. However, it is our responsibility to properly dereference everything so it can be garbage collected. In your case, you could override a hook and call gc.collect() manually to be sure it gets cleaned before you continue your program.

cowwoc · 2021-09-09T17:27:52Z

@carmocca Sorry, I'm new to Python so maybe I misunderstood. I thought that the presence of __del__ meant that you could invoke del dataloader_iterator and it would shut down the workers. Is that not the case? When is __del__ invoked?

Should I ask the pytorch guys to add a public method to close/dispose a DataLoader?

carmocca · 2021-09-09T18:00:52Z

__del__ gets invoked after manually calling del or when the garbage collector collects it (can be done manually with gc.collect())
But it's generally ugly to call del manually on objects as Python can figure out when to clean things, same for gc.collect()

Should I ask the pytorch guys to add a public method to close/dispose a DataLoader?

Sure, why not :)

luiscape · 2021-09-09T18:21:27Z

__del__ gets invoked after manually calling del

If I'm not wrong, del only removes a reference to an object from memory. __del__ is only really called with the gc kicks in (reference).

cowwoc · 2021-09-09T18:25:05Z

But it's generally ugly to call del manually on objects as Python can figure out when to clean things

I usually prefer the approach you mentioned but in this specific case I think it makes sense to explicitly del the iterator.

It's possible for the main process to have little garbage (so GC will never get triggered) in spite of the idle sub-processes hogging memory. As far as I understand it, garbage held by sub-processes do not apply any pressure on the GC of the main process.

cowwoc · 2021-09-09T18:30:23Z

__del__ gets invoked after manually calling del

If I'm not wrong, del only removes a reference to an object from memory. __del__ is only really called with the gc kicks in (reference).

That link reads:

Note del x doesn’t directly call x.del() — the former decrements the reference count for x by one, and the latter is only called when x’s reference count reaches zero.

But I just tested this locally and what I saw was that the reference count has to drop to zero and the variable has to be assigned to None and the GC has to run. Only then did __del__ get invoked. Omitting any of those 3 steps resulted in __del__ not getting invoked.

I'll ask the DataLoader guys to add an explicit cleanup method.

cowwoc · 2021-09-09T18:41:30Z

Here is my request for an explicit DataLoader cleanup method: pytorch/pytorch#64766

You should probably start with clearing out references for now and if they add a cleanup method we could revisit this issue.

cowwoc · 2021-09-26T14:30:21Z

It looks like the PyTorch-side changes might take a while to arrive. Are you able to release the Lightning changes without it? I am still running out of memory even though I am invoking gc.collect() explicitly. One way or another I will need the Lightning fix.

cowwoc · 2021-09-26T17:16:49Z

I figured out a workaround that seems to work:

Take a snapshot of the sub-processes before each optuna study or trial:

current_process = psutil.Process()
protected_processes = list(map(lambda process: process.pid, current_process.children(recursive=True)))

At the end of each trial, run:

current_process = psutil.Process()
children = current_process.children(recursive=True)
for child in children:
    if child.pid not in protected_processes:
        child.kill()

within a try-finally block to ensure that TrialPruned exceptions are handled as well.

I hope this helps others.

cowwoc · 2021-09-30T00:42:34Z

Per pytorch/pytorch#64766 (comment) and pytorch/pytorch#64766 (comment) they do not plan to add an explicit shutdown API in the near future. I'll keep on eye on this ticket and try gc.collect() once your changes are merged.

stale · 2021-12-20T23:50:42Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

stale · 2022-01-21T04:43:01Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

carmocca · 2022-02-01T18:06:21Z

Closing as this is a cool solution but not a natural solution for most people. It also limits the flexibility in memory management.

carmocca added feature Is an improvement or enhancement refactor design Includes a design discussion labels Sep 9, 2021

tchaton mentioned this issue Sep 9, 2021

Ability to release resources associated with a DataLoader #9390

Closed

tchaton added the let's do it! approved to implement label Sep 9, 2021

tchaton self-assigned this Sep 9, 2021

tchaton removed their assignment Nov 18, 2021

stale bot added the won't fix This will not be worked on label Dec 20, 2021

carmocca added loops Related to the Loop API and removed won't fix This will not be worked on refactor labels Dec 21, 2021

stale bot added the won't fix This will not be worked on label Jan 21, 2022

awaelchli added this to the 1.6 milestone Jan 21, 2022

stale bot removed the won't fix This will not be worked on label Jan 21, 2022

carmocca closed this as completed Feb 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Mechanism to dereference automatically in the Loops #9385

[RFC] Mechanism to dereference automatically in the Loops #9385

carmocca commented Sep 9, 2021 •

edited by github-actions bot

Loading

tchaton commented Sep 9, 2021

carmocca commented Sep 9, 2021

cowwoc commented Sep 9, 2021

carmocca commented Sep 9, 2021

cowwoc commented Sep 9, 2021 •

edited

Loading

carmocca commented Sep 9, 2021

luiscape commented Sep 9, 2021 •

edited

Loading

cowwoc commented Sep 9, 2021

cowwoc commented Sep 9, 2021 •

edited

Loading

cowwoc commented Sep 9, 2021 •

edited

Loading

cowwoc commented Sep 26, 2021

cowwoc commented Sep 26, 2021 •

edited

Loading

cowwoc commented Sep 30, 2021 •

edited

Loading

stale bot commented Dec 20, 2021

stale bot commented Jan 21, 2022

carmocca commented Feb 1, 2022

[RFC] Mechanism to dereference automatically in the Loops #9385

[RFC] Mechanism to dereference automatically in the Loops #9385

Comments

carmocca commented Sep 9, 2021 • edited by github-actions bot Loading

Proposed refactoring or deprecation

Motivation

Pitch

If you enjoy Lightning, check out our other projects! ⚡

tchaton commented Sep 9, 2021

carmocca commented Sep 9, 2021

cowwoc commented Sep 9, 2021

carmocca commented Sep 9, 2021

cowwoc commented Sep 9, 2021 • edited Loading

carmocca commented Sep 9, 2021

luiscape commented Sep 9, 2021 • edited Loading

cowwoc commented Sep 9, 2021

cowwoc commented Sep 9, 2021 • edited Loading

cowwoc commented Sep 9, 2021 • edited Loading

cowwoc commented Sep 26, 2021

cowwoc commented Sep 26, 2021 • edited Loading

cowwoc commented Sep 30, 2021 • edited Loading

stale bot commented Dec 20, 2021

stale bot commented Jan 21, 2022

carmocca commented Feb 1, 2022

carmocca commented Sep 9, 2021 •

edited by github-actions bot

Loading

cowwoc commented Sep 9, 2021 •

edited

Loading

luiscape commented Sep 9, 2021 •

edited

Loading

cowwoc commented Sep 9, 2021 •

edited

Loading

cowwoc commented Sep 9, 2021 •

edited

Loading

cowwoc commented Sep 26, 2021 •

edited

Loading

cowwoc commented Sep 30, 2021 •

edited

Loading