Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ValDataLoaderIter able to be reset only when it is acquired by the syntax of normal iterator #60

Merged
merged 5 commits into from
Aug 20, 2020

Conversation

NaleRaphael
Copy link
Contributor

@NaleRaphael NaleRaphael commented Aug 4, 2020

ValDataLoaderIter is not reset after each iteration, and it results in the problem shown in issue #59.
See also this comment: #59 (comment) for further details.

In this PR, tests related to TrainDataLoaderIter and ValDataLoaderIter are also added.


UPDATE:

Purpose of the changes in this PR is to make ValDataLoader work like torch.data.DataLoader. When we want to iterate over a torch.data.DataLoader, we can use the following approaches:

  1. iterate it through a for-loop
    (DataLoader.__iter__() is called implicitly by for loop)

    data_loader = DataLoader(dataset)
    for batch in data_loader:
        ...
    
    # This is OK since an iterator will be create each time `data_loader`
    # is used in a for-loop.
    for batch in data_loader:
        ...
  2. iterate it through an iterator created by iter()

    loader_iter = iter(data_loader)
    for i in len(data_loader):
        batch = next(loader_iter)
        ...
    
    # `loader_iter` should runs out of values now, so that the following
    # operation will trigger the exception `StopIteration`.
    next(loader_iter)

And the first approach is how we use val_loader before commit 52c189a. An iterator of val_loader will be created whenever it is iterated through the for-loop in self._validate().

if val_loader:
loss = self._validate(
val_loader, non_blocking_transfer=non_blocking_transfer
)

In commit 52c189a, ValDataLoaderIter was introduced to solve the problem of unpacking multiple values returned by a data loader. However, since an instance of ValDataLoaderIter is created, the iterator we can acquire by either a for-loop or iter() is always the same object (i.e. ValDataLoaderIter() itself). That means it can't be reset by those approaches mentioned above, and that's why it makes self._validate() failed to work as before.

With the changes made in this PR, ValDataLoaderIter should work like how we use torch.data.DataLoader.

Besides, the reason why we didn't just apply these changes to DataLoaderIter is that TrainDataLoaderIter relies on it's behavior. Since TrainDataLoaderIter should be capable of resetting itself when auto_reset is True, it should return itself when we acquire an iterator through TrainDataLoaderIter.__iter__().

acquired by the syntax of normal `iterator`

In `LRFinder.range_test()`, `val_iter` won't be reset after it runs
out of values, and it makes `LRFinder._validate()` failed to work
correctly after the first iteration of `range_test()`.

To fix it, we add a counter to count the times a `ValDataLoaderIter`
has run (i.e. times of `__next__()` is called). And reset it only
when its `__iter__()` is called. So that it won't be reset
automatically like the way `TrainDataLoaderIter` works.

See also davidtvs#59 and the docstring of `ValDataLoaderIter` for further
details.
tests/test_lr_finder.py Outdated Show resolved Hide resolved
tests/test_lr_finder.py Outdated Show resolved Hide resolved
@NaleRaphael
Copy link
Contributor Author

It seems that "ci-build / ci-build (3.7, 0.4.1)" stuck at step "Upload coverage to Codecov" 😲

@NaleRaphael
Copy link
Contributor Author

Comparing with successful jobs before, there is only three build results uploaded to codecov.

Expected result (4 coverage files, committed at Aug, 8):
https://codecov.io/gh/davidtvs/pytorch-lr-finder/commit/fd6f0126086caf0f9dfe766b1efee65ff5cbc000/build

Got result (3 coverage files, committed at today):
https://codecov.io/gh/davidtvs/pytorch-lr-finder/commit/af57ea7ae04220dd01b13fb429e484a19df00091/build

It seems that we cannot re-run this job through UI of GitHub Action currently, see also this post. But I found this post stated that we can push a empty commit to trigger it. If it's necessary to get those coverage reports from all 4 builds, feel free to let me know to do this.

@davidtvs
Copy link
Owner

I reran the jobs and now they ran normally.

I'm merging this. Thanks!

@davidtvs davidtvs merged commit 664824b into davidtvs:master Aug 20, 2020
@NaleRaphael
Copy link
Contributor Author

Thanks 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants