Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

caffe only need 8min for training mnist with accuracy of 0.99,why mxnet need over 50min for training? #1036

Closed
dushoufu opened this issue Dec 23, 2015 · 5 comments

Comments

@dushoufu
Copy link

I separately run caffe and mxnet to train mnist on the same computer with the mode of CPU.The result is that caffe is three times faster than mxnet.
I analyse it,the model and the environment is same,except that caffe‘s input is lmdb。 The two cases are directly executed according to the example。Caffe cost 700% of cpu,while mxnet cost 500% of cpu.
For mxnet,every cpu is not up to 30%. How can I check something to speed up mxnet?
mxnet:
INFO:root:Epoch[19] Time cost=146.922
INFO:root:Epoch[19] Validation-accuracy=0.990100

real 52m37.068s
user 87m37.811s
sys 205m59.149s

caffe
I1223 14:27:11.260304 12600 solver.cpp:408] Test net output #0: accuracy = 0.9909
I1223 14:27:11.260419 12600 solver.cpp:408] Test net output #1: loss = 0.0274076 (* 1 = 0.0274076 loss)
I1223 14:27:11.260432 12600 solver.cpp:325] Optimization Done.
I1223 14:27:11.260442 12600 caffe.cpp:215] Optimization Done.

real 8m28.679s
user 23m31.146s
sys 43m21.118s

@dushoufu dushoufu changed the title caffe only need 8min for training mnist with accuracy of 0.99,why mxnet need over 30min for training? caffe only need 8min for training mnist with accuracy of 0.99,why mxnet need over 50min for training? Dec 23, 2015
@dushoufu
Copy link
Author

I'm sure the team of mxnet has done detailed testing compared to caffe. I believe there is something wrong or abnormal with my situation.Is there someone that give me some instruction?

@lukemetz
Copy link
Contributor

Relevant: #1031

@piiswrong
Copy link
Contributor

We mostly focus on GPU performance so CPU is not optimized. But there is an easy fix that should improve performance a lot. check mshadow/tensor_cpu-inl.h:139. Currently the loop is not multithreaded. You can add multitheading with openmp on the outer loop. This should be easy and it should cover most ops.
If you can make this work you are welcome to contribute a PR.
Thanks.

@wangzhangup
Copy link

@piiswrong here is compare between MxNet and caffe. Both has totally same model and params, were computed on CPU on same machine. But, the predict time of MxNet is about 4 times slow than caffe. Btw, caffe uses one thread on CPU, and they both are built with atlas.

@dushoufu
Copy link
Author

I finish it

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants