-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code cleaning in preparation for #7258 [3/n] #7262
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7262 +/- ##
======================================
- Coverage 91% 91% -0%
======================================
Files 199 199
Lines 12799 12793 -6
======================================
- Hits 11701 11675 -26
- Misses 1098 1118 +20 |
def scale_batch_size( | ||
trainer, | ||
model: LightningModule, | ||
trainer: 'pl.Trainer', | ||
model: 'pl.LightningModule', | ||
mode: str = 'power', | ||
steps_per_trial: int = 3, | ||
init_val: int = 2, | ||
max_trials: int = 25, | ||
batch_arg_name: str = 'batch_size', | ||
**fit_kwargs | ||
): | ||
) -> Optional[int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should have some caveats that the tuner doesn't work with things like deepspeed or sharded ddp which have different behavior on multiple gpus right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with this. In general scale_batch_size
is not really that well tested in multi-gpu settings. Even the most simple case where you are using multiple gpus of different types (so maybe one with 8 gb of vram and one with 16 gb of vram) it will not assign higher batch size to the second device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SkafteNicki since you are the most familiar with the tuner limitations, can you open a PR showing warnings or raising an error for these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carmocca will do. I basically think that anything else than single cpu/gpu batch scaling is not supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What does this PR do?
Some changes related to #7258 but not critical which I have split into this PR
_validate_data_hooks
intoConfigValidator
scale_batch_size
would fail if the number of trials was 0scale_batch_size
tests fromtests/trainer/test_trainer_tricks.py
totests/tuner/test_scale_batch_size.py
Before submitting
PR review