Replies: 2 comments 1 reply
-
@bonlime good to know they are working for you, you should try them with the latest PyTorch 1.12 nvfuser codegen btw, can improve throughput quite a bit (they are also good on PyTorch XLA w/ TPU). You've probably picked it up, but in case others are looking at this and not familiar, the difference between the 'a' variants I threw in there and the paper ones are that 'a' always normalize the input by stats, even if activation is not enabled. I tried this variant on the regnetz because they are a lot of unactivated instances at the end of each block, the paper version wasn't working as well. The resnetv2 are also pre-act, but for some reason they worked fine with the I have not tried with WS, I didn't recall seeing any standard practice to that effect. There are very few uses of EvoNorm out there. GN + WS sure, but that's still just one research group who likes to do that combo.... |
Beta Was this translation helpful? Give feedback.
-
By the way, just a random thought to share with you. I've noticed you experimented with normalisations quite a lot, but I haven't seen using an idea from Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models (from author of original BN. TLDR - additionally use running stats in forward to make normalisation more stable) in my experiments is makes training significantly better with medium BS (8-12). After combining with calculating only RMS for |
Beta Was this translation helpful? Give feedback.
-
Hi,
First thanks for a great pre-trained models without BN (
regnetz_c16_evos
,regnetz_d8_evos
), I've used them for downstream tasks and ability to train using any BS is impressive.I've looked closely into their implementation and have several questions about design choices used.
EvoNorm2dS0a
instead ofGN + HardSwish/SiLU
? I do understand that they have slightly different formula, butGN + HardSwish/SiLU
is much faster in my experiments (up to 15% speed-up) and requires less memory. Also in your ResNet50 experiments (resnetv2_50d_gn
) GN works almost the same.Beta Was this translation helpful? Give feedback.
All reactions