Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train network with no bias in convolution layer #225

Open
ndcuong91 opened this issue Nov 8, 2018 · 10 comments
Open

Train network with no bias in convolution layer #225

ndcuong91 opened this issue Nov 8, 2018 · 10 comments

Comments

@ndcuong91
Copy link

Hi @zhreshold
I already train my mobilenetSSD network in Caffe with no bias. However, the convergence speed of network is too slow (mAP~35% after 3 days)
I just try Mxnet and i found that the performance in training of Mxnet is significantly better than Caffe. But i dont know how to remove 'beta' term in batch norm layer in MxNet like i did in Caffe. For another way, I remove batchnorm layer but the network couldn't converge.
Can you give me some hints?

@zhreshold
Copy link
Owner

zhreshold commented Nov 8, 2018

you can set lr_mult of batchnorm beta term to 0 to fix the beta, which is initialized as 0.

@ndcuong91
Copy link
Author

Thanks @zhreshold , it worked!

@ndcuong91 ndcuong91 reopened this Nov 9, 2018
@ndcuong91
Copy link
Author

ndcuong91 commented Nov 9, 2018

@zhreshold is this also fix 'gamma' term?
Should i remove 'fix_gamma=True' in bn layer?

@zhreshold
Copy link
Owner

@titikid You can leave gamma unfixed or not, depending your result, but I would prefer leave it free.

@ndcuong91
Copy link
Author

ndcuong91 commented Nov 10, 2018

I already trained 2 models from scratch, all parameters is set as default (lr=0.004, batch=48, single gpu)

  • model with fixed beta (only in base mobilenet network) and gamma: ~41.5% mAP after 220 epoches. Train log here
  • model with fixed beta (only in base mobilenet network): ~42% mAP after 220 epoches. Train log here
    @zhreshold Can you take a look and give me some tips for better mAP? should i train with bigger dataset first and fine-tune in voc2007/2012?

@zhreshold
Copy link
Owner

you have to use ImageNet pre-trained weights, otherwise you need a DSSD variant.

@ndcuong91
Copy link
Author

Hi @zhreshold
I found that if i remove "beta" term only, the convolution still has a small shift factor because the impact of "running_mean" term. i set "lr_mult" of "running_mean" term to 0 but i still see it updated during training. So, how can i completely remove it?

@ndcuong91
Copy link
Author

@zhreshold can you give me some suggestion?

@zhreshold
Copy link
Owner

@titikid For maximum flexibility I suggest you to use broadcast multiply instead of batchnorm itself. You have full control of how the behavior is without hacking batchnorm itself.

@ndcuong91
Copy link
Author

@zhreshold i'm not really clear what do you mean for now, but i will investigate it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants