-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add ConvolutionDepthwise layer #5665
base: master
Are you sure you want to change the base?
Conversation
weight_multiplier_shape.push_back(top[0]->height()); | ||
weight_multiplier_shape.push_back(top[0]->width()); | ||
weight_multiplier_.Reshape(weight_multiplier_shape); | ||
caffe_set(weight_multiplier_.count(), Dtype(1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caffe_set just for cpu_data @sp2823
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to set mutable_cpu_data or mutable_gpu_data once.
There is a similar implementation of batch_sum_multiplier_ in BatchNormLayer.
If it is necessary, we should use caffe_set in Forward_cpu and caffe_gpu_set in Forward_gpu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean caffe_set is just for pointer of cpu_data, and set data to pointer of gpu_data would crash.
请问,两个问题请教下: |
cpu下有待于优化 |
@sp2823 |
Great to see this work - I hope it gets merged soon. The correct name for this should be "DepthwiseSeparable". Just "Depthwise" gives almost the opposite meaning. |
I didn't optimize the CPU mode because the Convolution layer with group is slow in GPU mode. You can use this code for training and use Convolution layer for prediction. |
Could you share your .prototxt which show how to set parameters? or test_examples? |
I have attached the files required to train the popular mobilenet model: imagenet_mobilenet1.0_2017-07-04_10-44-00.zip I added the following code in layer_factory.cpp, GetConvolutionLayer() so that this layer will be called whever its appropriate to use: There is a speedup when using the proposed ConvolutionDepthwise layer is used instead of Convolution layer. But it is not as much as I expected. If fact if I just comment the group parameter in all convolution layers in both train.prototxt and test.prototxt, so that the 3x3 convolution becomes are traditional 3x3 convolution instead of DepthWise seperable, it becomes slightly faster! This was not what I was expecting. Is there something that I am missing? Please try the files that I shared. |
You only need to edit the .prototxt file like this. |
@sp2823 How do I merge your implementation into my CAFFE? Just download the hpp/cpp is OK? Thanks :) |
download the .hpp/.cpp/.cu file and compile |
Hi @sp2823 , |
Hi, @sp2823 |
@SophieZhou 你好,请问下在cpu下训练速度有变很快吗?性能达到什么程度。 |
up |
you did not implement CuDNNConvolutionDepthWiseLayer. Isn't it necessary? |
The new implement runs faster than "convolution layer with group" (with CuDNN and without CuDNN), but it seems no fast enough. e.g. AlexNet has more FLOPS than MobileNet, but runs faster. |
The implement depthwise convolution of tensorflow https://github.com/tensorflow/tensorflow/blob/2d23960e29c3efb1b30c5cb6f8f3368635b97f29/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc |
Did you compare the FPS with current caffe implementation? @ |
@sp2823 how to optimize the CPU mode? |
@mathmanu The reason traditional 3x3 convolution faster than Depthwise separable conv is that caffe use forloop to execute the conv operation that with multiple group , it is very slow . But DepthWise separable conv has much less parms. |
* convolution-depthwise: unknown error abc satisfy the code format of caffe satisfy the code format of caffe satisfy the code format of caffe satisfy the code format of caffe satisfy the code format of caffe add ConvolutionDepthwise layer add ConvolutionDepthwise layer
hello, I added ConvolutionDepthwise layer cpp and cu in caffe_ssd in TX2 and recompile caffe, however, when I test the model for mobilenet, it shows the error: cudnn_softmax_layer.cpp:15] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR just after conv3/dw -> conv3/dw. Should I make some changes for ConvolutionDepthwise layer cu file? |
weight_buffer_shape.push_back(bottom[0]->num()); | ||
weight_buffer_shape.push_back(top[0]->height()); | ||
weight_buffer_shape.push_back(top[0]->width()); | ||
weight_buffer_.Reshape(weight_buffer_shape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we seriously need a 6-dimensional buffer for weights? If I have a batch of 64 feature maps, let's say 256 channels, 32x32, and want to convolve with a 3x3 filter, this line would allocate 256*3*3*64*32*32 = 150,994,994 floats, so almost 600 MB - that sounds like a significant overkill.
I have made some tests, both on a raw convolution-oriented benchmark and some actually useful network. This is indeed faster than the default Caffe convolution engine for grouped conv, but the RAM requirements are higher, and I'm not so sure if this is a good thing. So the question is: can the same speedup be achieved while allocating less memory? Like I said in the comment, do we really need to have such a large buffer for weights? |
No offense, I'm new in github. I just want to know whether ConvolutionDepthwise add to master branch or not?? |
Excuse me,I had download the core files and compile successfully. But it's wrong when training Net. |
https://arxiv.org/pdf/1704.04861v1.pdf
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
convolution depthwise layer
faster and less memory than the "convolution layer with group" (with CuDNN and without CuDNN)