convnet-benchmarks

Easy benchmarking of all public open-source implementations of convnets. A summary is provided in the section below.

Machine: 6-core Intel Core i7-5930K CPU @ 3.50GHz + NVIDIA Titan X + Ubuntu 14.04 x86_64

##Imagenet Winners Benchmarking I pick some popular imagenet models, and I clock the time for a full forward + backward pass. I average my times over 10 runs. I ignored dropout and softmax layers.

AlexNet (One Weird Trick paper) - Input 128x3x224x224

Library	Class	Time (ms)	forward (ms)	backward (ms)
NervanaSys-16	ConvLayer	97	30	67
NervanaSys-32	ConvLayer	109	31	78
fbfft	SpatialConvolutionCuFFT	136	45	91
cudaconvnet2*	ConvLayer	177	42	135
CuDNN (R2) *	cudnn.SpatialConvolution	231	70	161
Caffe (native)	ConvolutionLayer	324	121	203
Torch-7 (native)	SpatialConvolutionMM	342	132	210

Overfeat [fast] - Input 128x3x231x231

Library	Class	Time (ms)	forward (ms)	backward (ms)
NervanaSys-16	ConvLayer	364	119	245
NervanaSys-32	ConvLayer	410	126	284
fbfft	SpatialConvolutionCuFFT	407	139	268
cudaconvnet2*	ConvLayer	723	176	547
CuDNN (R2) *	cudnn.SpatialConvolution	810	234	576
Caffe	ConvolutionLayer	823	355	468
Torch-7 (native)	SpatialConvolutionMM	878	379	499

OxfordNet [Model-A] - Input 64x3x224x224

Library	Class	Time (ms)	forward (ms)	backward (ms)
NervanaSys-16	ConvLayer	530	166	364
NervanaSys-32	ConvLayer	629	173	456
fbfft	SpatialConvolutionCuFFT	1092	355	737
cudaconvnet2*	ConvLayer	1229	408	821
CuDNN (R2) *	cudnn.SpatialConvolution	1099	342	757
Caffe	ConvolutionLayer	1068	323	745
Torch-7 (native)	SpatialConvolutionMM	1105	350	755

Layer-wise Benchmarking

###Spatial Convolution layer (3D input 3D output, densely connected)

forward + backprop (wrt input and weights)

Original Library	Class/Function Benchmarked	Time (ms)	forward (ms)	backward (ms)
fbfft	SpatialConvolutionCuFFT	256	101	155
cuda-convnet2 *	ConvLayer	977	201	776
cuda-convnet**	pylearn2.cuda_convnet	1077	312	765
CuDNN R2 *	cudnn.SpatialConvolution	1019	269	750
Theano	CorrMM	1225	407	818
Caffe	ConvolutionLayer	1231	396	835
Torch-7	SpatialConvolutionMM	1265	418	877
DeepCL	ConvolutionLayer	6280	2648	3632
cherry-picking****	best per layer	235	79	155

This table is NOT UPDATED For TITAN-X. These numbers below were on Titan Black and are here only for informational and legacy purposes.

Original Library	Class/Function Benchmarked	Time (ms)	forward (ms)	backward (ms)
Theano (experimental)***	conv2d_fft	1178	304	874
Torch-7	nn.SpatialConvolutionBHWD	1892	581	1311
ccv	ccv_convnet_layer	809+bw	809
Theano (legacy)	conv2d	70774	3833	66941

* indicates that the library was tested with Torch bindings of the specific kernels.
** indicates that the library was tested with Pylearn2 bindings.
*** This is an experimental module which used FFT to calculate convolutions. It uses a lot of memory according to @benanne
**** The last row shows results obtainable when choosing the best-performing library for each layer.
L1 - Input: 128x128 Batch-size 128, Feature maps: 3->96, Kernel Size: 11x11, Stride: 1x1
L2 - Input: 64x64 Batch-size 128, Feature maps: 64->128, Kernel Size: 9x9, Stride: 1x1
L3 - Input: 32x32 Batch-size 128, Feature maps: 128->128, Kernel Size: 9x9, Stride: 1x1
L4 - Input: 16x16 Batch-size 128, Feature maps: 128->128, Kernel Size: 7x7, Stride: 1x1
L5 - Input: 13x13 Batch-size 128, Feature maps: 384->384, Kernel Size: 3x3, Stride: 1x1
The table is ranked according to the total time forward+backward calls for layers (L1 + L2 + L3 + L4 + L5)

#####Breakdown

forward

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library	Class/Function Benchmarked	L1	L2	L3	L4	L5	Total
fbfft	SpatialConvolutionCuFFT	57	27	6	2	9	101
cuda-convnet2 *	ConvLayer	36	113	40	4	8	201
cuda-convnet**	pylearn2.cuda_convnet	38	183	68	7	16	312
CuDNN R2	cudnn.SpatialConvolution	56	143	53	6	11	269
Theano	CorrMM	91	143	121	24	28	407
Caffe	ConvolutionLayer<Dtype>	93	136	116	24	27	396
Torch-7	nn.SpatialConvolutionMM	94	149	123	24	28	418
DeepCL	ConvolutionLayer	738	1241	518	47	104	2648
cherry-picking****	best per layer	36	27	6	2	8	79

backward (gradInput + gradWeight)

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library	Class/Function Benchmarked	L1	L2	L3	L4	L5	Total
fbfft	SpatialConvolutionCuFFT	76	45	12	4	18	155
cuda-convnet2 *	ConvLayer	103	467	162	15	29	776
cuda-convnet**	pylearn2.cuda_convnet	136	433	147	15	34	765
CuDNN R2	cudnn.SpatialConvolution	139	401	159	19	32	750
Theano	CorrMM	179	405	174	29	31	818
Caffe	ConvolutionLayer<Dtype>	200	405	172	28	30	835
Torch-7	nn.SpatialConvolutionMM	206	432	178	29	32	877
DeepCL	ConvolutionLayer	484	2144	747	59	198	3632
cherry-picking****	best per layer	76	45	12	4	18	155

Name		Name	Last commit message	Last commit date
Latest commit History 316 Commits
CUV		CUV
TorontoDeepLearning-convnet		TorontoDeepLearning-convnet
caffe		caffe
ccv		ccv
convnet.js		convnet.js
cuda-convnet2		cuda-convnet2
cxxnet		cxxnet
deepcl		deepcl
eblearn		eblearn
glconv		glconv
matlab-DeepLearnToolbox		matlab-DeepLearnToolbox
nervana		nervana
nnforge		nnforge
theano		theano
torch7		torch7
veles		veles
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

convnet-benchmarks

Layer-wise Benchmarking

forward + backprop (wrt input and weights)

forward

backward (gradInput + gradWeight)

About

Releases

Packages

Languages

ajkxyz/convnet-benchmarks

Folders and files

Latest commit

History

Repository files navigation

convnet-benchmarks

Layer-wise Benchmarking

forward + backprop (wrt input and weights)

forward

backward (gradInput + gradWeight)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages