benchmarks/other-wrappers at master · tensorpack/benchmarks

README.md

Tensorpack is 1.2x~5x faster than the equivalent code written in some other TF high-level APIs.

Hardware: AWS p3.16xlarge (8 Tesla V100s)
Software: Python 3.6, TF 1.13.1, cuda 10, cudnn 7.4.2, Keras 2.1.5, tflearn 0.3.2, tensorpack 0.9.4.
Measurement: speed is measured by images per second (larger is better). First epoch is warmup and is not considered in timing. Second or later epochs have statistically insignificant difference.
Data:
- True data for Cifar10.
- For ImageNet, assumed to be a constant numpy array already available on CPU. This is a reasonable setting because data always has to come from somewhere to CPU anyway.
Source code is here. They are all <100 lines that you can easily run and verify by yourself.

Task	tensorpack	Keras	tflearn
Keras Official Cifar10 Example	11904	7142	5882
VGG16 on fake ImageNet	230	204	194
AlexNet on fake ImageNet	2603	1454	N/A
ResNet50 on fake ImageNet	333	266	N/A

Each script used in this section can be run with "./script.py NUM_GPU" to use a different number of GPUs.

Notes:

With a better (but different in batch sizes, etc) setting in ../ResNet-MultiGPU/, tensorpack can further reach 2800 im/s for ResNet50 on a p3.16xlarge instance. And 9225 im/s with all optimizations + fp16 turned on.
It's possible for Keras to be faster (by using better input pipeline, building data-parallel graph by yourself, etc), but it's NOT how most users are using Keras or how any of the Keras examples are written.

Using Keras together with Tensorpack is one way to make Keras faster. See the Keras+Tensorpack example.