Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make extract_batch_size a LightningModule.hook #10635

Closed
rohitgr7 opened this issue Nov 19, 2021 · 11 comments
Closed

make extract_batch_size a LightningModule.hook #10635

rohitgr7 opened this issue Nov 19, 2021 · 11 comments
Labels
data handling Generic data-related topic feature Is an improvement or enhancement won't fix This will not be worked on

Comments

@rohitgr7
Copy link
Contributor

rohitgr7 commented Nov 19, 2021

🚀 Feature

Motivation

Copy pasting context from here: #10408 (review)

A generic idea from @ninginthecloud point of view, they think extract_data_batch() is this kind function we could convert as hook and delegate to user to implement, because they know their batch, it will be easily for them to define the batch_size or how to get batch_size. While here we had a lot assumptions about what the batch is going to look like, but we still can only extract valid batch_size under very limited situations.

Some comments:
#10408 (comment)
#10408 (comment)

Pitch

Alternatives

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @Borda @justusschock @awaelchli @ninginthecloud @tchaton @ananthsub @carmocca @SeanNaren Any thoughts ?

@rohitgr7 rohitgr7 added the feature Is an improvement or enhancement label Nov 19, 2021
@rohitgr7 rohitgr7 changed the title make extract_batch_size a hook make extract_batch_size a LightningModule.hook Nov 19, 2021
@rohitgr7 rohitgr7 added the data handling Generic data-related topic label Nov 19, 2021
@awaelchli
Copy link
Contributor

Can you comment on the differences/advantages this would bring vs. self.log(..., batch_size=...) which to me seems to offer the same today?

@rohitgr7
Copy link
Contributor Author

I am fine with the current implementation: #10408 (comment)
just wanted to discuss this over an issue rather than on that PR.

@awaelchli
Copy link
Contributor

if we do this, then we will have two different ways by which user can define their batch size. One will be this hook and another one will be self.log(..., batch_size=)

This was exactly my thought!
A major complaint about Lightning was that there are multiple ways of doing the same. This was raised recently several times by FB (#7740).

self.log(..., batch_size=...) seems to be the most flexible at the moment.

@carmocca
Copy link
Contributor

carmocca commented Nov 19, 2021

Pros:

  • The place that defines the data should be able to define this (DataHooks)

Cons:

  • Since the batch size is only used for logging, it does make sense to have it there.
  • Perhaps the user applied further transformations to their batch and they need to customize the value at *_step time.

Leaning towards not doing it.

@stale
Copy link

stale bot commented Dec 27, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Dec 27, 2021
@dnnspark
Copy link

dnnspark commented Mar 2, 2022

Hello, I'm facing similar issue while I'm using trainer.track_grad_norm=2. It's calling extract_batch_size() which does not work with my dataloader.

@rohitgr7
Copy link
Contributor Author

rohitgr7 commented Mar 7, 2022

it calls extract_batch_size on the current batch. By default, it can handle normal tensors but in case it's a custom batch or something then you can specify your batch_size using self.log(..., batch_size). You can try it with Trainer(fast_dev_run=True) to check whether it throws any warning. If it does then I recommend specifying the batch_size as I mentioned above, if it doesn't then it's good to go :)

@dnnspark
Copy link

Thanks for replying and detailed explanation, @rohitgr7! One last question: in case of distributed training, does the batch_size has to be the "effective" batch size (i.e. len(batch) * trainer.gpus * trainer.num_nodes) or the batch size per GPU (i.e. len(batch)?

@carmocca
Copy link
Contributor

Per GPU

@rubencart
Copy link

I am getting a ...MisconfigurationException: We could not infer the batch_size from the batch... error when running with a custom batch structure and track_grad_norm=2, since tracking the grad norm makes a call to log_dict without batch_size set.

.../site-packages/pytorch_lightning/core/module.py", line 565, in log_grad_norm
    self.log_dict(grad_norm_dict, on_step=True, on_epoch=True, prog_bar=False, logger=True)

@carmocca
Copy link
Contributor

What's the structure specifically? Could you make changes to it so that it's infer-able?

The relevant code is in: https://github.com/Lightning-AI/lightning/blob/38acba08fc009caac32b6b0ab8984d8935569db9/src/pytorch_lightning/utilities/data.py#L50-L96

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data handling Generic data-related topic feature Is an improvement or enhancement won't fix This will not be worked on
Projects
None yet
Development

No branches or pull requests

5 participants