- Batch Size = 8, Image = 3 x 224 x 224 (IF NOTHING SPECIFIED / CPU USED)
- Batch Size = 4, Image = 3 x 224 x 224
- Resnet 101
- Resnet 152
Model | Framework | Forward Pass | Backward Pass | Total Time | Inference |
---|---|---|---|---|---|
VGG16 | Pytorch 0.4.1 | 0.0245 s | 0.0606 s | 0.0852 s | 0.0234 s |
Flux 0.6.8+ | 0.0287 s | 0.0760 s | 0.1047 s | 0.0288 s | |
VGG16 BN | Pytorch 0.4.1 | 0.0271 s | 0.0672 s | 0.0943 s | 0.0273 s |
Flux 0.6.8+ | 0.0333 s | 0.0818 s | 0.1151 s | 0.0327 s | |
VGG19 | Pytorch 0.4.1 | 0.0281 s | 0.0741 s | 0.1021 s | 0.0280 s |
Flux 0.6.8+ | 0.0355 s | 0.0923 s | 0.1278 s | 0.0356 s | |
VGG19 BN | Pytorch 0.4.1 | 0.0321 s | 0.0812 s | 0.1134 s | 0.0325 s |
Flux 0.6.8+ | 0.0377 s | 0.0965 s | 0.1342 s | 0.0371 s | |
Resnet18 | Pytorch 0.4.1 | 0.0064 s | 0.0125 s | 0.0190 s | 0.0050 s |
Flux 0.6.8+ | 0.0079 s | 0.0218 s | 0.0297 s | 0.0079 s | |
Resnet34 | Pytorch 0.4.1 | 0.0092 s | 0.0216 s | 0.0307 s | 0.0092 s |
Flux 0.6.8+ | 0.0137 s | 0.0313 s | 0.0450 s | 0.0151 s | |
Resnet50 | Pytorch 0.4.1 | 0.0155 s | 0.0351 s | 0.0506 s | 0.0152 s |
Flux 0.6.8+ | 0.0205 s | 0.1795 s | 0.2000 s | - | |
Resnet101 | Pytorch 0.4.1 | 0.0297 s | 0.0379 s | 0.0676 s | 0.0298 s |
Flux 0.6.8+ | 0.0215 s | 0.0616 s | 0.0831 s | 0.0208 s | |
Resnet152 | Pytorch 0.4.1 | 0.0431 s | 0.05337 s | 0.0965 s | 0.0429 s |
Flux 0.6.8+ | 0.0308 s | 0.0807 s | 0.1115 s | 0.0298 s |
Model | Framework | Forward Pass | Backward Pass | Total Time | Inference |
---|---|---|---|---|---|
VGG16 | Pytorch 0.4.1 | 6.6024 s | 9.4336 s | 16.036 s | 6.4216 s |
Flux 0.6.8+ | 10.458 s | 10.245 s | 20.703 s | 10.111 s | |
VGG16 BN | Pytorch 0.4.1 | 7.0793 s | 9.0536 s | 16.132 s | 6.7909 s |
Flux 0.6.8+ | 29.633 s | 18.649 s | 49.282 s | 24.047 s | |
VGG19 | Pytorch 0.4.1 | 8.3075 s | 10.899 s | 19.207 s | 8.0593 s |
Flux 0.6.8+ | 12.226 s | 12.457 s | 24.683 s | 12.029 s | |
VGG19 BN | Pytorch 0.4.1 | 8.7794 s | 12.739 s | 21.519 s | 8.4044 s |
Flux 0.6.8+ | 28.518 s | 21.464 s | 49.982 s | 22.649 s |
- Conv3x3/1 = Conv2d, 3x3 Kernel, 1x1 Padding, 1x1 Stride
- Conv5x5/1 = Conv2d, 5x5 Kernel, 2x2 Padding, 1x1 Stride
- Conv3x3/2 = Conv2d, 3x3 Kernel, 1x1 Padding, 2x2 Stride
- Conv5x5/2 = Conv2d, 5x5 Kernel, 2x2 Padding, 2x2 Stride
- Dense = 1024 => 512
- BatchNorm = BatchNorm2d
Layer | Framework | Forward Pass | Backward Pass | Total Time |
---|---|---|---|---|
Conv3x3/1 | Pytorch 0.4.1 | 0.2312 ms | 0.5359 ms | 0.7736 ms |
Flux 0.6.8+ | 0.1984 ms | 0.7640 ms | 0.9624 ms | |
Conv5x5/1 | Pytorch 0.4.1 | 0.2667 ms | 0.5345 ms | 0.8299 ms |
Flux 0.6.8+ | 0.2065 ms | 0.8075 ms | 1.014 ms | |
Conv3x3/2 | Pytorch 0.4.1 | 0.1170 ms | 0.2203 ms | 0.3376 ms |
Flux 0.6.8+ | 0.0927 ms | 0.5988 ms | 0.6915 ms | |
Conv5x5/2 | Pytorch 0.4.1 | 0.1233 ms | 0.2162 ms | 0.3407 ms |
Flux 0.6.8+ | 0.0941 ms | 0.6515 ms | 0.7456 ms | |
Dense | Pytorch 0.4.1 | 0.0887 ms | 0.1523 ms | 0.2411 ms |
Flux 0.6.8+ | 0.0432 ms | 0.2044 ms | 0.2476 ms | |
BatchNorm | Pytorch 0.4.1 | 0.1096 ms | 0.1999 ms | 0.3095 ms |
Flux 0.6.8+ | 0.2211 ms | 0.2849 ms | 0.5060 ms |
To reproduce the benchmarks checkout Flux 0.6.8+
avik-pal/cudnn_batchnorm and CuArrays
master.
Since the Batchnorm GPU is broken for Flux 0.6.8+ master so we cannot perform the benchmarks using that.