updating batch norm after ema or checkpoint averaging #403

shairoz-deci · 2021-02-03T16:56:47Z

shairoz-deci
Feb 3, 2021

Thank you for sharing this repo.
In regards to the ema and model averaging implemented in the repo, should there be a process of updating the batch-norm layers after completing training as done in SWA?
https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging/

This seems required as the BN statistics of the final model, whether averaged or ema, should differ from that received by averaging or performing ema. Should a forward pass on the training data be made to update the statistic and if not why?

Thanks in advance

Answered by rwightman

Feb 3, 2021

@shairoz-deci there are multiple q about that in past issues, discuss... but in my experience, no it is not necessary and I feel it works better not to. Many of the training recipes for popular models in Tensorflow, such as EfficientNet, etc average the BN stats as well... it really just gives you a longer time const for the stats since they are already ema'd with the momentum param. I don't see why it would cause them to deviate enough to be a problem.

View full answer

rwightman · 2021-02-03T21:02:48Z

rwightman
Feb 3, 2021
Maintainer

@shairoz-deci there are multiple q about that in past issues, discuss... but in my experience, no it is not necessary and I feel it works better not to. Many of the training recipes for popular models in Tensorflow, such as EfficientNet, etc average the BN stats as well... it really just gives you a longer time const for the stats since they are already ema'd with the momentum param. I don't see why it would cause them to deviate enough to be a problem.

3 replies

shairoz-deci Feb 4, 2021
Author

@rwightman thank you for your reply. Do you claim then that for SWA it is not necessary as well? as updating the BN seems like a common practice among different SWA implementations.

rwightman Feb 4, 2021
Maintainer

I don't use SWA so couldn't ans that question. I do often average my best checkpoints from training (across possibly wide number of epochs) and there hasn't been an issue with BN, that would be closer to the SWA approach.

shairoz-deci Feb 6, 2021
Author

Thank you very much for your reply

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updating batch norm after ema or checkpoint averaging #403

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

updating batch norm after ema or checkpoint averaging #403

shairoz-deci Feb 3, 2021

Replies: 1 comment · 3 replies

rwightman Feb 3, 2021 Maintainer

shairoz-deci Feb 4, 2021 Author

rwightman Feb 4, 2021 Maintainer

shairoz-deci Feb 6, 2021 Author

shairoz-deci
Feb 3, 2021

Replies: 1 comment 3 replies

rwightman
Feb 3, 2021
Maintainer

shairoz-deci Feb 4, 2021
Author

rwightman Feb 4, 2021
Maintainer

shairoz-deci Feb 6, 2021
Author