Train network with no bias in convolution layer #225

ndcuong91 · 2018-11-08T07:52:51Z

Hi @zhreshold
I already train my mobilenetSSD network in Caffe with no bias. However, the convergence speed of network is too slow (mAP~35% after 3 days)
I just try Mxnet and i found that the performance in training of Mxnet is significantly better than Caffe. But i dont know how to remove 'beta' term in batch norm layer in MxNet like i did in Caffe. For another way, I remove batchnorm layer but the network couldn't converge.
Can you give me some hints?

zhreshold · 2018-11-08T19:23:23Z

you can set lr_mult of batchnorm beta term to 0 to fix the beta, which is initialized as 0.

ndcuong91 · 2018-11-09T04:01:49Z

Thanks @zhreshold , it worked!

ndcuong91 · 2018-11-09T06:09:14Z

@zhreshold is this also fix 'gamma' term?
Should i remove 'fix_gamma=True' in bn layer?

zhreshold · 2018-11-09T19:08:14Z

@titikid You can leave gamma unfixed or not, depending your result, but I would prefer leave it free.

ndcuong91 · 2018-11-10T10:21:19Z

I already trained 2 models from scratch, all parameters is set as default (lr=0.004, batch=48, single gpu)

model with fixed beta (only in base mobilenet network) and gamma: ~41.5% mAP after 220 epoches. Train log here
model with fixed beta (only in base mobilenet network): ~42% mAP after 220 epoches. Train log here
@zhreshold Can you take a look and give me some tips for better mAP? should i train with bigger dataset first and fine-tune in voc2007/2012?

zhreshold · 2018-11-11T19:14:34Z

you have to use ImageNet pre-trained weights, otherwise you need a DSSD variant.

ndcuong91 · 2018-12-03T10:09:47Z

Hi @zhreshold
I found that if i remove "beta" term only, the convolution still has a small shift factor because the impact of "running_mean" term. i set "lr_mult" of "running_mean" term to 0 but i still see it updated during training. So, how can i completely remove it?

ndcuong91 · 2018-12-12T04:50:15Z

@zhreshold can you give me some suggestion?

zhreshold · 2018-12-12T21:29:55Z

@titikid For maximum flexibility I suggest you to use broadcast multiply instead of batchnorm itself. You have full control of how the behavior is without hacking batchnorm itself.

ndcuong91 · 2018-12-13T10:43:52Z

@zhreshold i'm not really clear what do you mean for now, but i will investigate it. Thanks!

ndcuong91 closed this as completed Nov 9, 2018

ndcuong91 reopened this Nov 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train network with no bias in convolution layer #225

Train network with no bias in convolution layer #225

ndcuong91 commented Nov 8, 2018

zhreshold commented Nov 8, 2018 •

edited

Loading

ndcuong91 commented Nov 9, 2018

ndcuong91 commented Nov 9, 2018 •

edited

Loading

zhreshold commented Nov 9, 2018

ndcuong91 commented Nov 10, 2018 •

edited

Loading

zhreshold commented Nov 11, 2018

ndcuong91 commented Dec 3, 2018

ndcuong91 commented Dec 12, 2018

zhreshold commented Dec 12, 2018

ndcuong91 commented Dec 13, 2018

Train network with no bias in convolution layer #225

Train network with no bias in convolution layer #225

Comments

ndcuong91 commented Nov 8, 2018

zhreshold commented Nov 8, 2018 • edited Loading

ndcuong91 commented Nov 9, 2018

ndcuong91 commented Nov 9, 2018 • edited Loading

zhreshold commented Nov 9, 2018

ndcuong91 commented Nov 10, 2018 • edited Loading

zhreshold commented Nov 11, 2018

ndcuong91 commented Dec 3, 2018

ndcuong91 commented Dec 12, 2018

zhreshold commented Dec 12, 2018

ndcuong91 commented Dec 13, 2018

zhreshold commented Nov 8, 2018 •

edited

Loading

ndcuong91 commented Nov 9, 2018 •

edited

Loading

ndcuong91 commented Nov 10, 2018 •

edited

Loading