-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-246] operators for Synchronized BatchNorm #10303
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single CrossDeviceBatchNorm would be much more efficient
@@ -164,7 +164,12 @@ inline void SetShapeType(const Context& ctx, | |||
NDArrayStorageType storage_type = static_cast<NDArrayStorageType>(out_storage_types[i]); | |||
if (outputs[i]->is_none()) { | |||
if (storage_type == kDefaultStorage) { | |||
*outputs[i] = NDArray(out_shapes[i], ctx, true, out_types[i]); | |||
if (outputs.size() == inputs.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for AllReduce operator, the outputs should be in the same device as the corresponding input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too hacky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I aggree. I am not familiar with very backend code. Could you give suggestions?
Only all-reduce operation need cross-device communication. The other operations should be calculated individually at different device. That's why I separate those operations to different "operators". |
why not combine them into the same operator? It would save memory and be much more efficient |
There are someone in TuSimple shared idea of this. It may help you. |
Thanks @chinakook Actually, we already implemented synchronized batch normalization internally. Just thinking about releasing a proper API for the user. |
MXNet Gluon Cross-GPU Batch Norm is Implemented here https://github.com/zhanghang1989/MXNet-Gluon-SyncBN . Feel free to try and leave comments. |
Closing this in favor of the new PR #11502 |
Description
Backend operators for Synchronized Batch Norm. Design idea as in http://hangzh.com/SynchronizeBN/
Special thanks to Haibin @eric-haibin-lin
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments