Evaluating efficiency of several types of convolutions.
conv1x1 | conv3x1 | conv1x3 | conv3x3sep | conv3x3 | conv5x5 | conv3x3dilated | |
---|---|---|---|---|---|---|---|
Keras CPU | 6.736 | 14.133 | 14.043 | 7.184 | 43.700 | 118.898 | 49.442 |
Keras GPU | 1.135 | 1.525 | 1.440 | 1.556 | 1.571 | 2.848 | 2.008 |
PyTorch CPU | 6.956 | 17.209 | 16.916 | 16.480 | 50.636 | 133.781 | 111.480 |
PyTorch GPU | 0.102 | 0.180 | 0.186 | 1.951 | 0.230 | 1.024 | 0.484 |
conv1x1 | conv3x1 | conv1x3 | conv3x3sep | conv3x3 | conv5x5 | conv3x3dilated | |
---|---|---|---|---|---|---|---|
processing time [sec] | 6.736 | 14.133 | 14.043 | 7.184 | 43.700 | 118.898 | 49.442 |
vs 3x3 | 0.154 | 0.323 | 0.321 | 0.164 | 1.000 | 2.721 | 1.131 |
theoretical complexity | 0.111 | 0.333 | 0.333 | 0.016 | 1.000 | 2.778 | 1.000 |
conv1x1 | conv3x1 | conv1x3 | conv3x3sep | conv3x3 | conv5x5 | conv3x3dilated | |
---|---|---|---|---|---|---|---|
processing time [sec] | 1.135 | 1.525 | 1.440 | 1.556 | 1.571 | 2.848 | 2.008 |
vs 3x3 | 0.722 | 0.971 | 0.916 | 0.990 | 1.000 | 1.812 | 1.278 |
theoretical complexity | 0.111 | 0.333 | 0.333 | 0.016 | 1.000 | 2.778 | 1.000 |
conv1x1 | conv3x1 | conv1x3 | conv3x3sep | conv3x3 | conv5x5 | conv3x3dilated | |
---|---|---|---|---|---|---|---|
processing time [sec] | 6.956 | 17.209 | 16.916 | 16.480 | 50.636 | 133.781 | 111.480 |
vs 3x3 | 0.137 | 0.340 | 0.334 | 0.325 | 1.000 | 2.642 | 2.202 |
theoretical complexity | 0.111 | 0.333 | 0.333 | 0.016 | 1.000 | 2.778 | 1.000 |
conv1x1 | conv3x1 | conv1x3 | conv3x3sep | conv3x3 | conv5x5 | |
---|---|---|---|---|---|---|
processing time [sec] | 0.102 | 0.173 | 0.169 | 3.786 | 0.230 | 1.108 |
vs 3x3 | 0.441 | 0.750 | 0.733 | 16.447 | 1.000 | 4.816 |
theoretical complexity | 0.111 | 0.333 | 0.333 | 0.016 | 1.000 | 2.778 |
cudnn.benchmark = True
conv1x1 | conv3x1 | conv1x3 | conv3x3sep | conv3x3 | conv5x5 | |
---|---|---|---|---|---|---|
processing time [sec] | 0.096 | 0.173 | 0.169 | 1.716 | 0.229 | 0.984 |
vs 3x3 | 0.418 | 0.753 | 0.735 | 7.485 | 1.000 | 4.291 |
theoretical complexity | 0.111 | 0.333 | 0.333 | 0.016 | 1.000 | 2.778 |
cudnn.benchmark = True
, cudnn.fastest = True
conv1x1 | conv3x1 | conv1x3 | conv3x3sep | conv3x3 | conv5x5 | conv3x3dilated | |
---|---|---|---|---|---|---|---|
processing time [sec] | 0.102 | 0.180 | 0.186 | 1.951 | 0.230 | 1.024 | 0.484 |
vs 3x3 | 0.444 | 0.780 | 0.809 | 8.464 | 1.000 | 4.446 | 2.101 |
theoretical complexity | 0.111 | 0.333 | 0.333 | 0.016 | 1.000 | 2.778 | 1.000 |
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in Proc. of CVPR, 2016.
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," in arXiv:1704.04861, 2017.
F. Chollet, "Xception: Deep Learning with Depthwise Separable Convolutions," in Proc. of CVPR, 2017.