Do not prefetch when possible #12101

carmocca · 2022-02-24T20:26:17Z

What does this PR do?

Only prefetch unless we require it. That is when:

Iterable datasets are used, where we need to know when it will be exhausted to run validation.
Interbatch parallelism is used, where no prefetching would be silly.

This is good for FFCV because they already do prefetching internally.

We no longer error on 0-length map-style datasets as this is inconsistent with iterable datasets.

Part of #11538
Follow-up to #11606
Reverts #1280

Does your PR introduce any breaking changes? If yes, please list them.

None knowingly

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
[n/a] Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

cc @Borda @justusschock @awaelchli @ninginthecloud @akihironitta

pytorch_lightning/utilities/fetching.py

tests/loops/test_loops.py

tests/utilities/test_fetching.py

tchaton

LGTM ! Some comments.

pytorch_lightning/loops/dataloader/evaluation_loop.py

pytorch_lightning/loops/fit_loop.py

pytorch_lightning/utilities/fetching.py

pytorch_lightning/loops/dataloader/evaluation_loop.py

rohitgr7

small comments.

rohitgr7 · 2022-02-28T16:51:33Z

docs/source/guides/data.rst

@@ -393,6 +393,9 @@ option when using sequential data.
    to ``limit_{mode}_batches``, if it is set to 1.0 it will run for the whole dataset, otherwise it will throw an exception.
    Here ``mode`` can be train/val/test/predict.

+When iterable datasets are used, Lightning will pre-fetch 1 batch (in addition to the current batch) so it can detect
+when the training will stop and run validation if necessary.


Suggested change

when the training will stop and run validation if necessary.

when the training epoch will end and run validation if necessary.

It can happen at any point in training:

https://github.com/PyTorchLightning/pytorch-lightning/blob/02ccd874b9eda885df7752710e605e259d579d9c/pytorch_lightning/loops/epoch/training_epoch_loop.py#L488-L489

while the comment here is not wrong, the real reason we have the prefetching is actually to avoid starting a new epoch that doesn't have any batches left in the dataloader. Relying on a StopIteration check is not possible for this reason.

pytorch_lightning/utilities/data.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Do not prefetch when possible

2acc0b2

carmocca added the feature Is an improvement or enhancement label Feb 24, 2022

carmocca added this to the 1.6 milestone Feb 24, 2022

carmocca self-assigned this Feb 24, 2022

carmocca added data handling Generic data-related topic performance labels Feb 24, 2022

carmocca added 4 commits February 24, 2022 21:50

Minor change

4501536

Remove TODOs

dc20fc6

Fix for fault-tolerance restart

d361131

Fix comments

b9655d8

carmocca commented Feb 25, 2022

View reviewed changes

carmocca added 3 commits February 25, 2022 01:09

mypy

25bb22d

Fix manual test

6e76588

Fix fault tolerant manual test

9c51e85

carmocca marked this pull request as ready for review February 25, 2022 00:31

carmocca requested review from Borda, tchaton, SeanNaren, awaelchli, justusschock, williamFalcon, kaushikb11 and rohitgr7 as code owners February 25, 2022 00:31

carmocca added 2 commits February 25, 2022 03:09

Fix test and extend coverage

444c707

Support 0-length map datasets

6284474

tchaton approved these changes Feb 28, 2022

View reviewed changes

carmocca added 2 commits February 28, 2022 11:56

Merge branch 'master' into feat/no-prefetch-default

8a6b102

Rename variable

1fc58ed

carmocca enabled auto-merge (squash) February 28, 2022 12:49

kaushikb11 reviewed Feb 28, 2022

View reviewed changes

pytorch_lightning/loops/dataloader/evaluation_loop.py Show resolved Hide resolved

justusschock approved these changes Feb 28, 2022

View reviewed changes

mergify bot added the ready PRs ready to be merged label Feb 28, 2022

Mention prefetching in the docs

a0334e9

carmocca requested a review from edenlightning as a code owner February 28, 2022 16:43

carmocca requested a review from kaushikb11 February 28, 2022 16:43

rohitgr7 approved these changes Feb 28, 2022

View reviewed changes

Update pytorch_lightning/utilities/data.py

54403f6

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

carmocca merged commit 6309a59 into master Feb 28, 2022

carmocca deleted the feat/no-prefetch-default branch February 28, 2022 18:31

jasonxzhou mentioned this pull request May 6, 2022

Epochs terminating early incorrectly #12956

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not prefetch when possible #12101

Do not prefetch when possible #12101

carmocca commented Feb 24, 2022 •

edited

Loading

tchaton left a comment

rohitgr7 left a comment

rohitgr7 Feb 28, 2022

carmocca Feb 28, 2022

awaelchli Feb 28, 2022

	when the training will stop and run validation if necessary.
	when the training epoch will end and run validation if necessary.

Do not prefetch when possible #12101

Do not prefetch when possible #12101

Conversation

carmocca commented Feb 24, 2022 • edited Loading

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

tchaton left a comment

Choose a reason for hiding this comment

rohitgr7 left a comment

Choose a reason for hiding this comment

rohitgr7 Feb 28, 2022

Choose a reason for hiding this comment

carmocca Feb 28, 2022

Choose a reason for hiding this comment

awaelchli Feb 28, 2022

Choose a reason for hiding this comment

carmocca commented Feb 24, 2022 •

edited

Loading