add ConvolutionDepthwise layer #5665

sp2823 · 2017-06-02T06:36:41Z

https://arxiv.org/pdf/1704.04861v1.pdf
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

convolution depthwise layer
faster and less memory than the "convolution layer with group" (with CuDNN and without CuDNN)

fengziyong · 2017-06-09T03:02:18Z

src/caffe/layers/conv_dw_layer.cpp

+  weight_multiplier_shape.push_back(top[0]->height());
+  weight_multiplier_shape.push_back(top[0]->width());
+  weight_multiplier_.Reshape(weight_multiplier_shape);
+  caffe_set(weight_multiplier_.count(), Dtype(1),


caffe_set just for cpu_data @sp2823

We only need to set mutable_cpu_data or mutable_gpu_data once.
There is a similar implementation of batch_sum_multiplier_ in BatchNormLayer.
If it is necessary, we should use caffe_set in Forward_cpu and caffe_gpu_set in Forward_gpu.

I mean caffe_set is just for pointer of cpu_data, and set data to pointer of gpu_data would crash.

zj19921221 · 2017-06-19T04:14:17Z

请问，两个问题请教下：
1、您这个实现和caffe中用group实现有什么优点吗？
2、模糊理解了group不能够并行的，并不是很理解为什么加了for循环就不能并行？

NHZlX · 2017-06-19T12:20:44Z

cpu下有待于优化

zjchuyp · 2017-06-20T08:16:05Z

@sp2823
Is it faster by using forloop than using gemm in CPU mode?

mathmanu · 2017-06-20T08:16:40Z

Great to see this work - I hope it gets merged soon. The correct name for this should be "DepthwiseSeparable". Just "Depthwise" gives almost the opposite meaning.

sp2823 · 2017-06-27T10:45:24Z

I didn't optimize the CPU mode because the Convolution layer with group is slow in GPU mode. You can use this code for training and use Convolution layer for prediction.

youngwanLEE · 2017-07-05T06:36:17Z

Could you share your .prototxt which show how to set parameters? or test_examples?

mathmanu · 2017-07-05T07:53:01Z

I have attached the files required to train the popular mobilenet model:

imagenet_mobilenet1.0_2017-07-04_10-44-00.zip

I added the following code in layer_factory.cpp, GetConvolutionLayer() so that this layer will be called whever its appropriate to use:
if(conv_param.num_output() == conv_param.group()) {
return shared_ptr< Layer< Dtype > >(new ConvolutionDepthwiseLayer(param));
}

There is a speedup when using the proposed ConvolutionDepthwise layer is used instead of Convolution layer. But it is not as much as I expected.

If fact if I just comment the group parameter in all convolution layers in both train.prototxt and test.prototxt, so that the 3x3 convolution becomes are traditional 3x3 convolution instead of DepthWise seperable, it becomes slightly faster! This was not what I was expecting.

Is there something that I am missing? Please try the files that I shared.

sp2823 · 2017-07-09T17:19:48Z

You only need to edit the .prototxt file like this.
type: "Convolution"
type: "ConvolutionDepthwise"

ryusaeba · 2017-07-10T02:22:57Z

@sp2823 How do I merge your implementation into my CAFFE? Just download the hpp/cpp is OK? Thanks :)

sp2823 · 2017-07-10T06:23:35Z

download the .hpp/.cpp/.cu file and compile

leochli · 2017-07-18T13:58:54Z

Hi @sp2823 ,
I am new to caffe. I got an error saying like "class caffe::ConvolutionParameter’ has no member named ‘kernel_size_size’" in conv_dw_layer.cpp when I'm trying to compile.
Any idea of this error?

SophieZhou · 2017-07-21T04:02:52Z

Hi, @sp2823
I have trained mobilenet using your code for 20 epoches, and the top1 about 52%, and about 76% for top5. Do you have any experiments results?
By the way, the code is very well. The speed for training is much faster than group convolution way. And I hope the results are well too.

zj19921221 · 2017-07-25T15:34:40Z

@SophieZhou 你好，请问下在cpu下训练速度有变很快吗？性能达到什么程度。

birdwcp · 2017-07-28T09:21:14Z

up

birdwcp · 2017-08-01T07:14:33Z

you did not implement CuDNNConvolutionDepthWiseLayer. Isn't it necessary?

7oud · 2017-08-18T01:14:54Z

The new implement runs faster than "convolution layer with group" (with CuDNN and without CuDNN), but it seems no fast enough. e.g. AlexNet has more FLOPS than MobileNet, but runs faster.
Cudnn v7 has already released and brings grouped convolution feature, maybe it will be faster, Did you try it?

violet17 · 2017-08-18T01:19:50Z

The implement depthwise convolution of tensorflow https://github.com/tensorflow/tensorflow/blob/2d23960e29c3efb1b30c5cb6f8f3368635b97f29/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
which one is faster?

twmht · 2017-09-27T06:03:57Z

@sp2823

Did you compare the FPS with current caffe implementation? @

libra7 · 2017-10-27T03:50:38Z

@sp2823 how to optimize the CPU mode?

dawuchen · 2017-11-03T06:32:58Z

@mathmanu The reason traditional 3x3 convolution faster than Depthwise separable conv is that caffe use forloop to execute the conv operation that with multiple group , it is very slow . But DepthWise separable conv has much less parms.

* convolution-depthwise: unknown error abc satisfy the code format of caffe satisfy the code format of caffe satisfy the code format of caffe satisfy the code format of caffe satisfy the code format of caffe add ConvolutionDepthwise layer add ConvolutionDepthwise layer

lijuan123 · 2018-01-06T05:30:10Z

hello, I added ConvolutionDepthwise layer cpp and cu in caffe_ssd in TX2 and recompile caffe, however, when I test the model for mobilenet, it shows the error: cudnn_softmax_layer.cpp:15] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR just after conv3/dw -> conv3/dw. Should I make some changes for ConvolutionDepthwise layer cu file?

Noiredd · 2018-02-09T09:20:04Z

src/caffe/layers/conv_dw_layer.cpp

+  weight_buffer_shape.push_back(bottom[0]->num());
+  weight_buffer_shape.push_back(top[0]->height());
+  weight_buffer_shape.push_back(top[0]->width());
+  weight_buffer_.Reshape(weight_buffer_shape);


Do we seriously need a 6-dimensional buffer for weights? If I have a batch of 64 feature maps, let's say 256 channels, 32x32, and want to convolve with a 3x3 filter, this line would allocate 256*3*3*64*32*32 = 150,994,994 floats, so almost 600 MB - that sounds like a significant overkill.

Noiredd · 2018-02-09T11:16:09Z

I have made some tests, both on a raw convolution-oriented benchmark and some actually useful network. This is indeed faster than the default Caffe convolution engine for grouped conv, but the RAM requirements are higher, and I'm not so sure if this is a good thing.
On a synthetic 1x256x256x256 blob with a 5x5 convolution in group of 256 this PR gives a 33% speedup (312 ms -> 217 ms for a forward-backward pass) at a barely noticeable RAM increase (1924 MB -> 1987 MB). But on a more realistic network with several conv layers, some of them grouped some not, results varied more drastically: iteration time went down from 119 ms to 46 ms, but RAM usage jumped from 425 MB to 590 MB.

So the question is: can the same speedup be achieved while allocating less memory? Like I said in the comment, do we really need to have such a large buffer for weights?

chibai · 2018-06-08T09:16:45Z

No offense, I'm new in github. I just want to know whether ConvolutionDepthwise add to master branch or not??
If not, what's the major problem??

Phil-Lin · 2019-07-15T06:04:00Z

Excuse me，I had download the core files and compile successfully. But it's wrong when training Net.
Was I miss some step??
*** Aborted at 1563169270 (unix time) try "date -d @1563169270" if you are using GNU date ***
PC: @ 0x7fa004053d3e caffe::ConvolutionDepthwiseLayer<>::LayerSetUp()
*** SIGSEGV (@0x0) received by PID 25146 (TID 0x7fa00473bb00) from PID 0; stack trace: ***
@ 0x7fa0020d74b0 (unknown)
@ 0x7fa004053d3e caffe::ConvolutionDepthwiseLayer<>::LayerSetUp()
@ 0x7fa004150157 caffe::Net<>::Init()
@ 0x7fa00415289e caffe::Net<>::Net()
@ 0x7fa00410eb3a caffe::Solver<>::InitTrainNet()
@ 0x7fa004110005 caffe::Solver<>::Init()
@ 0x7fa00411031f caffe::Solver<>::Solver()
@ 0x7fa00412e4b1 caffe::Creator_SGDSolver<>()
@ 0x40a788 train()
@ 0x407578 main
@ 0x7fa0020c2830 __libc_start_main
@ 0x407e49 _start
@ 0x0 (unknown)

sp2823 added 2 commits June 2, 2017 14:27

add ConvolutionDepthwise layer

947f3ca

add ConvolutionDepthwise layer

978b11f

sp2823 closed this Jun 2, 2017

sp2823 reopened this Jun 2, 2017

sp2823 added 5 commits June 2, 2017 15:47

satisfy the code format of caffe

15191ab

satisfy the code format of caffe

3d0182c

satisfy the code format of caffe

0a6ff1e

satisfy the code format of caffe

e51f834

satisfy the code format of caffe

d40fbd8

willyd mentioned this pull request Jun 7, 2017

depth wise convolution #5649

Open

fengziyong reviewed Jun 9, 2017

View reviewed changes

abc

013377e

unknown error

327a019

cpwei80 mentioned this pull request Aug 1, 2017

cuDNN bug in Caffe with "group" in conv layer (misaligned address) #5729

Open

fmassa mentioned this pull request Aug 1, 2017

Add support of depth-wise convolution torch/nn#1098

Closed

KeyKy mentioned this pull request Aug 7, 2017

fast Depthwise mxnet operator KeyKy/mobilenet-mxnet#8

Closed

shicai mentioned this pull request Nov 14, 2017

why it is so slow to use the caffe engine shicai/MobileNet-Caffe#32

Closed

Noiredd added focus speed-up labels Jan 30, 2018

Noiredd reviewed Feb 9, 2018

View reviewed changes

antonioborondo mentioned this pull request Oct 2, 2018

Recognize ConvolutionDepthwise as Convolution opencv/opencv#12713

Merged

ysh329 mentioned this pull request Nov 25, 2018

mobilenetv1、mobilenetv2、shufflenetv1、shufflenetv2 yuenshome/yuenshome.github.io#21

Open

yangfly mentioned this pull request Mar 30, 2019

ConvolutionDepthwise is even slower than Normal Convolution. opencv/opencv#14190

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add ConvolutionDepthwise layer #5665

add ConvolutionDepthwise layer #5665

sp2823 commented Jun 2, 2017

fengziyong Jun 9, 2017

sp2823 Jun 9, 2017

fengziyong Jun 9, 2017

zj19921221 commented Jun 19, 2017 •

edited

Loading

NHZlX commented Jun 19, 2017

zjchuyp commented Jun 20, 2017

mathmanu commented Jun 20, 2017

sp2823 commented Jun 27, 2017

youngwanLEE commented Jul 5, 2017

mathmanu commented Jul 5, 2017 •

edited

Loading

sp2823 commented Jul 9, 2017

ryusaeba commented Jul 10, 2017

sp2823 commented Jul 10, 2017

leochli commented Jul 18, 2017

SophieZhou commented Jul 21, 2017

zj19921221 commented Jul 25, 2017

birdwcp commented Jul 28, 2017

birdwcp commented Aug 1, 2017

7oud commented Aug 18, 2017

violet17 commented Aug 18, 2017

twmht commented Sep 27, 2017 •

edited

Loading

libra7 commented Oct 27, 2017

dawuchen commented Nov 3, 2017

lijuan123 commented Jan 6, 2018

Noiredd Feb 9, 2018 •

edited

Loading

Noiredd commented Feb 9, 2018

chibai commented Jun 8, 2018

Phil-Lin commented Jul 15, 2019

add ConvolutionDepthwise layer #5665

Are you sure you want to change the base?

add ConvolutionDepthwise layer #5665

Conversation

sp2823 commented Jun 2, 2017

fengziyong Jun 9, 2017

Choose a reason for hiding this comment

sp2823 Jun 9, 2017

Choose a reason for hiding this comment

fengziyong Jun 9, 2017

Choose a reason for hiding this comment

zj19921221 commented Jun 19, 2017 • edited Loading

NHZlX commented Jun 19, 2017

zjchuyp commented Jun 20, 2017

mathmanu commented Jun 20, 2017

sp2823 commented Jun 27, 2017

youngwanLEE commented Jul 5, 2017

mathmanu commented Jul 5, 2017 • edited Loading

sp2823 commented Jul 9, 2017

ryusaeba commented Jul 10, 2017

sp2823 commented Jul 10, 2017

leochli commented Jul 18, 2017

SophieZhou commented Jul 21, 2017

zj19921221 commented Jul 25, 2017

birdwcp commented Jul 28, 2017

birdwcp commented Aug 1, 2017

7oud commented Aug 18, 2017

violet17 commented Aug 18, 2017

twmht commented Sep 27, 2017 • edited Loading

libra7 commented Oct 27, 2017

dawuchen commented Nov 3, 2017

lijuan123 commented Jan 6, 2018

Noiredd Feb 9, 2018 • edited Loading

Choose a reason for hiding this comment

Noiredd commented Feb 9, 2018

chibai commented Jun 8, 2018

Phil-Lin commented Jul 15, 2019

zj19921221 commented Jun 19, 2017 •

edited

Loading

mathmanu commented Jul 5, 2017 •

edited

Loading

twmht commented Sep 27, 2017 •

edited

Loading

Noiredd Feb 9, 2018 •

edited

Loading