Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not getting a good accuracy #4

Open
leochli opened this issue Jul 24, 2017 · 21 comments
Open

Not getting a good accuracy #4

leochli opened this issue Jul 24, 2017 · 21 comments

Comments

@leochli
Copy link

leochli commented Jul 24, 2017

Hi @farmingyard ,

I ran your deploy prototxt on imagenet this weekend yet still got a bad accuracy output. (exactly the same prototxt

I'd be appreciated if you could share your solver file with me to check.

much thanks!

@farmingyard
Copy link
Owner

farmingyard commented Jul 24, 2017

@LeoLee96

here is an example, batch size is 64, you can try it!

net: "train_val.prototxt"
#test_initialization: false
#test_iter: 100
#test_interval: 5000
display: 40
average_loss: 40
base_lr: 0.01
lr_policy: "poly"
power: 1.0
max_iter: 1000000
momentum: 0.9
weight_decay: 0.0001
snapshot: 5000
snapshot_prefix: "shufflenet"

@leochli
Copy link
Author

leochli commented Jul 24, 2017

@farmingyard
thanks man! Btw what's your acc for this? I only got 54% as top1_acc and 79% as top5_acc. According to the paper it's only around 34.1% error rate.

I tested on two GPUs, this might cause some problem if the ShuffleChannel layer doesn't support multiple-GPU. I'm not sure tho. I'll try your solver to see.

thanks a lot!

@farmingyard
Copy link
Owner

@LeoLee96

I got 62.8% top1 acc and 84.7% top 5 acc, the result is not good enough with paper's, it still needs tuning...

@KeyKy
Copy link

KeyKy commented Jul 24, 2017

mark

@zimenglan-sysu-512
Copy link

hi @farmingyard i just wonder that how do you write the prototxt? do you code to write? if, can you share it? thanks.

@farmingyard
Copy link
Owner

@leochli
Copy link
Author

leochli commented Aug 11, 2017

Hi @farmingyard ,

Do you finally reach the 65.9% top 1 acc in the paper?

I trained with:
batchsize 256,
totally 100 epochs,
base lr: 0.1
decay the learning rate by 0.1 every 30 epochs.

Yet I only got around 64% acc at the end.

I'd be appreciated if you could share with me some tricks in your training process.

Thx a lot!

@farmingyard
Copy link
Owner

@LeoLee96
Your model is better than mine,i didn't keep on training anymore,so my result is still same to the above.

@7oud
Copy link

7oud commented Aug 24, 2017

Hi @farmingyard ,@LeoLee96
I trained shufflenet on our data , but got a worse output than alexnet.
I'd be appreciated if you could share your curve of train loss.
Thanks !

@zhangleiedu
Copy link

hi, @LeoLee96 can you share your pre-trained model.
Thanks.

@xiaomr
Copy link

xiaomr commented Sep 6, 2017

hi, @LeoLee96 , when you train shuffle net on two GPUs,you said this might cause some problem beacause the ShuffleChannel layer doesn't support multiple-GPU. how do you solve ? I got "Multi-GPU execution not available - rebuild with USE_NCCL" error, could you give me some advice

@leochli
Copy link
Author

leochli commented Sep 6, 2017

@xiaomr
Hi, I'm not sure tho. Since the depthwise conv layer are not designed for all parallel-GPU systems, if you have your own parallel GPU system, you may need to modify this layers to fit your system. I didn't get this USE_NCLL error even before the modification. Anyway, try to run shuffle net on a single GPU first.

@xiaomr
Copy link

xiaomr commented Sep 7, 2017

@thank you for your advice! I have fixed the problem, it seems that depose layer can support multi gnu, the problem is because I chose the wrong branch of caffe~

@adapt-image-models
Copy link

Hi, @LeoLee96 , do you finally reach the 65.9% val acc ?
I trained 90 epochs, with a batch_size of 256 on 4 GPUs, base_lr=0.1 and divide it by 10 every 30 epochs, wd=4e-5. But I only get 63.3 val acc. Can you give me some advice ?

@anlongstory
Copy link

@LeoLee96 Hi, I am a new guy to learn deep learning ,now, I want to use Caffe to train ShuffleNet on my own data ,but just with one .prototxt file I have no idea ,could you give me some direction or advises?

@ppwwyyxx
Copy link

ppwwyyxx commented Oct 10, 2017

I can reproduce the paper's accuracy of a 40Mflops shufflenet with tensorflow (https://github.com/tensorpack/tensorpack/tree/master/examples/ImageNetModels#shufflenet). You can use the configuration there as a reference.

@andeyeluguo
Copy link

andeyeluguo commented Oct 16, 2017

I only get 43% val acc when the epoch is 400000, I use your solver.prototxt and change the deploy.prototxt into train_val.prototxt. Is it not sufficient to train? or the preprocess of data is not true?
Mine is:
transform_param {
mirror: true
crop_size: 224
scale: 0.017
mean_value: [103.94,116.78,123.68]
}
should I change the preprocess into :
transform_param {
mirror: false
crop_size: 224
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
or anything else?

@wang5566
Copy link

@VectorYYYY
You mean batchsize 256 for every GPU or total batchsize 256 for 4 GPUs?

@ppwwyyxx
Copy link

According to the paper the batch size is 256 on each GPU making a total batch size of 1024. Other settings such as learning rate schedule are also clear so I don't know why would people invent their own settings if the goal is to reproduce the result.

@wang5566
Copy link

1080ti can only set batchsize to 64 and I set 4 gpus for training. But I found loss around 2.1 cannot decrease and the model top1 accuracy is around 53%

@ppwwyyxx
Copy link

According to https://arxiv.org/abs/1706.02677 you can use 1/4 learning rate together with 1/4 batch size and train 4x more steps to get roughly the same results.

Besides that, my implementation can actually train a shufflenet 1x with batchsize 128 on a 1080ti, and shufflenet 0.5x with batchsize 256.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests