Fix an issue that caused Lightning to extract the batch size even though it was set by the user in `LightningModule.log` #10408

rohitgr7 · 2021-11-08T13:21:11Z

What does this PR do?

Seems like after we added the warning to check for batch_size for an ambiguous batch, a few users realized that the batch_size used internally isn't correct. We always make a call to extract batch size even when it's set by the user explicitly inside self.log(batch_size=...) which IMO is not required because we have a recursive call to extract batch size which causes unnecessary extraction for it even when it's not required plus it will raise a warning if someone specifies it and we still look for it. So now the call will be made inside .log only when batch_size is not specified.

In short: this fixes an issue that caused Lightning to extract the batch size even though it was set by the user in LightningModule.log.

So an overview of the solution is:
track the batch and use it to extract batch_size inside self.log call when it's not specified explicitly and metric is being logged with on_epoch=True and will be averaged at the end.

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

cc @Borda @carmocca @edward-io @ananthsub

pytorch_lightning/core/lightning.py

pytorch_lightning/loops/epoch/training_epoch_loop.py

pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py

rohitgr7 · 2021-11-19T09:18:17Z

Hi, @rohitgr7 Thanks a lot for tackling this issue! I left a few comments in code, feel free to let me know if you have any questions. A generic idea from my point of view, I think extract_data_batch() is this kind function we could convert as hook and delegate to user to implement, because they know their batch, it will be easily for them to define the batch_size or how to get batch_size. While here we had a lot assumptions about what the batch is going to look like, but we still can only extract valid batch_size under very limited situations. Anyway, I don't want to interrupt this PR discussion thread, I will initialize an issue to start the discussion. cc: @tchaton @ananthsub

if we do this, then we will have two different ways by which user can define their batch size. One will be this hook and another one will be self.log(..., batch_size=). They still can define methods within their lightning module and can call it to retrieve the batch_size and use it within self.log. Plus we already have too many hooks and this seems more like a utility method to me. Just a personal opinion :)

tchaton · 2021-11-19T09:27:32Z

To add to @rohitgr7 point,

self.log(..., batch_size=) was more a rescue solution and I personally don't find it elegant to provide the batch size, especially if you are logging outside of the *_step methods of the LightningModule where the current batch is unavailable. But this might affect a minority of users.

Plus we already have too many hooks
This is entirely true and we want to enforce only 1 way to do anything within Lightning. I believe we would have to weigh the hook vs the self.log(..., batch_size=) in terms of usability/simplicity.

@carmocca @awaelchli @SeanNaren Any thoughts ?

rohitgr7 · 2021-11-19T09:39:31Z

@tchaton let's discuss more here: #10635

…g` (#10408) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

…g` (Lightning-AI#10408) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

…g` (#10408) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

# Why? In #11 and #12 we see that updates to the occwl library introduced breaking changes which were not spotted in the example notebooks. Thank you to @[cteqeu](https://github.com/cteqeu) for reporting the issues and @[akashaero](https://github.com/akashaero) for sharing the workaround. This PR addresses the problems # What? - A new version of occwl is released [see here](https://github.com/AutodeskAILab/occwl/releases/tag/v1.0.0). We now fix the version of occwl to this version to avoid further unexpected breakage. - In [jupyter_segmentation_viewer.py](visualization/jupyter_segmentation_viewer.py) we remove the calls to the removed functions. - All the notebooks are rerun and retested. - Some warnings were firing in [pytorch-lightning](Lightning-AI/pytorch-lightning#10408). To ensure that all metrics are computed correctly we pass the number of faces to the logger as the batch size. Notice that the IoU and accuracy metrics reported in [the paper](https://arxiv.org/abs/2104.00706) are per-face rather than per solid. - In [scale_utils.py](utils/scale_utils.py) we make use of the scaling function in occwl rather than re-implementing it here.

rohitgr7 added 8 commits November 9, 2021 16:47

extract batch size only when it's required

5e10b93

revamp log

8941e32

fix

51a7d36

fix

c2a3626

update tests

be5882c

chlog

bba8055

fix

308f9c0

add test

04314b8

rohitgr7 force-pushed the fix/batch_size branch from 151647b to 04314b8 Compare November 9, 2021 13:00

rohitgr7 added 5 commits November 9, 2021 18:31

Merge branch 'master' into fix/batch_size

b1de03e

mypy

46a8861

mypy

9b28e44

deref current batch

0772967

mypy

c3567ff

rohitgr7 changed the title ~~Update to extract batch size only when its required~~ Fix the call to extract_batch_size only when its required Nov 9, 2021

rohitgr7 added bug Something isn't working data handling Generic data-related topic labels Nov 9, 2021

rohitgr7 added this to the 1.5.x milestone Nov 9, 2021

rohitgr7 commented Nov 9, 2021

View reviewed changes

pytorch_lightning/core/lightning.py Outdated Show resolved Hide resolved

rohitgr7 commented Nov 9, 2021

View reviewed changes

pytorch_lightning/loops/epoch/training_epoch_loop.py Show resolved Hide resolved

pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py Outdated Show resolved Hide resolved

rohitgr7 marked this pull request as ready for review November 9, 2021 15:00

rohitgr7 requested review from awaelchli, Borda, carmocca, justusschock, kaushikb11, SeanNaren, tchaton and williamFalcon as code owners November 9, 2021 15:00

mergify bot added the has conflicts label Nov 9, 2021

rohitgr7 mentioned this pull request Nov 19, 2021

make extract_batch_size a LightningModule.hook #10635

Closed

minor updates

bb6f3c4

rohitgr7 requested review from ninginthecloud and carmocca November 19, 2021 09:52

mergify bot added the has conflicts label Nov 19, 2021

carmocca added 3 commits November 19, 2021 15:15

Simplify check

250defa

Remove silly comment

1ee9b17

Merge branch 'master' into fix/batch_size

b806ab0

mergify bot removed the has conflicts label Nov 19, 2021

mypy fix

560da62

carmocca approved these changes Nov 19, 2021

View reviewed changes

mergify bot added ready PRs ready to be merged has conflicts labels Nov 19, 2021

Merge branch 'master' into fix/batch_size

4c859d3

mergify bot removed the has conflicts label Nov 19, 2021

carmocca enabled auto-merge (squash) November 19, 2021 16:16

carmocca disabled auto-merge November 19, 2021 16:16

carmocca enabled auto-merge (squash) November 19, 2021 16:16

carmocca merged commit ec27313 into master Nov 19, 2021

carmocca deleted the fix/batch_size branch November 19, 2021 16:48

rohitgr7 added a commit that referenced this pull request Nov 20, 2021

Fix batch size extraction when set by the user in `LightningModule.lo…

affd617

…g` (#10408) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

Raalsky pushed a commit to neptune-ai/pytorch-lightning that referenced this pull request Nov 23, 2021

Fix batch size extraction when set by the user in `LightningModule.lo…

2ba159b

…g` (Lightning-AI#10408) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

awaelchli pushed a commit that referenced this pull request Nov 24, 2021

Fix batch size extraction when set by the user in `LightningModule.lo…

6ac6992

…g` (#10408) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

lexierule pushed a commit that referenced this pull request Nov 24, 2021

Fix batch size extraction when set by the user in `LightningModule.lo…

8eee9b3

…g` (#10408) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

rohitgr7 mentioned this pull request Feb 7, 2022

Weekly Patch Release v1.5.10 #11789

Closed

12 tasks

JoeLambourne mentioned this pull request May 25, 2022

Update code for latest occwl version AutodeskAILab/BRepNet#13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix an issue that caused Lightning to extract the batch size even though it was set by the user in `LightningModule.log` #10408

Fix an issue that caused Lightning to extract the batch size even though it was set by the user in `LightningModule.log` #10408

rohitgr7 commented Nov 8, 2021 •

edited by github-actions bot

Loading

rohitgr7 commented Nov 19, 2021

tchaton commented Nov 19, 2021

rohitgr7 commented Nov 19, 2021

Fix an issue that caused Lightning to extract the batch size even though it was set by the user in LightningModule.log #10408

Fix an issue that caused Lightning to extract the batch size even though it was set by the user in LightningModule.log #10408

Conversation

rohitgr7 commented Nov 8, 2021 • edited by github-actions bot Loading

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

Did you have fun?

rohitgr7 commented Nov 19, 2021

tchaton commented Nov 19, 2021

rohitgr7 commented Nov 19, 2021

Fix an issue that caused Lightning to extract the batch size even though it was set by the user in `LightningModule.log` #10408

Fix an issue that caused Lightning to extract the batch size even though it was set by the user in `LightningModule.log` #10408

rohitgr7 commented Nov 8, 2021 •

edited by github-actions bot

Loading