Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mobilenet structure omitts last pooling layer #40

Closed
zhenglaizhang opened this issue Feb 8, 2018 · 5 comments
Closed

mobilenet structure omitts last pooling layer #40

zhenglaizhang opened this issue Feb 8, 2018 · 5 comments

Comments

@zhenglaizhang
Copy link

Have a check at the mobilenet structure, and most of the weights are in the final FC layer:

pre_fc1_weight parameter size=25690112, shape=(512L, 50176L)

and looking at the code, the last pooling layer is omitted (mobilenet).

conv_13_dw = Conv(conv_12, num_group=512, num_filter=512, kernel=(3, 3), pad=(1, 1), stride=(2, 2), name="conv_13_dw") # 14/7
conv_13 = Conv(conv_13_dw, num_filter=1024, kernel=(1, 1), pad=(0, 0), stride=(1, 1), name="conv_13") # 7/7
conv_14_dw = Conv(conv_13, num_group=1024, num_filter=1024, kernel=(3, 3), pad=(1, 1), stride=(1, 1), name="conv_14_dw") # 7/7
conv_14 = Conv(conv_14_dw, num_filter=1024, kernel=(1, 1), pad=(0, 0), stride=(1, 1), name="conv_14") # 7/7
body = conv_14
fc1 = symbol_utils.get_fc1(body, num_classes, fc_type)

Is this by design? Ignoring the last pooling layer leads to much larger model size -:(

@nttstar
Copy link
Collaborator

nttstar commented Feb 8, 2018

please refer to network structure section in our paper.

@zhenglaizhang
Copy link
Author

@nttstar thanks for the info, I will read carefully the paper.

@zhenglaizhang
Copy link
Author

zhenglaizhang commented Feb 9, 2018

@nttstar Hi, you are definitely right, I was using the version 'E' which removed the GP layer.

Then I tried the mobilenetv1 with D as version output, whose model size is around 15M, and started to train 15 hours ago, with 128 batch size, but found the test accuracy is still around 0.5.

Have you guys tried to train with such settings? maybe I need to tune other hyperparameters...

@nttstar
Copy link
Collaborator

nttstar commented Feb 11, 2018

@zhenglaizhang We all use 'E' in our recent experiments.

@mmxuan18
Copy link

mmxuan18 commented Aug 2, 2019

@zhenglaizhang i also has this question, and in the code last conv directly connect with mx.sym.FullyConnected, is this func will do flatten inner?

which part of the paper explain this modify?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants