awslabs · roywei · Apr 20, 2018 · Apr 18, 2018 · Apr 18, 2018 · Apr 18, 2018
diff --git a/benchmark/README.md b/benchmark/README.md
@@ -13,22 +13,36 @@ To switch between different backends refer to
 [configure Keras backend](https://github.com/awslabs/keras-apache-mxnet/wiki/Installation#2-configure-keras-backend)
 
 ## CNN Benchmarks
-We provide benchmark scripts to run on CIFAR10, ImageNet and Synthetic Dataset(randomly generated) 
+We provide benchmark scripts to run on CIFAR-10, ImageNet and Synthetic Dataset(randomly generated)
+
+### CIFAR-10 Dataset
+[CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset has 60000 32x32 color images in 10 classes.
+The [training scripts](https://github.com/awslabs/keras-apache-mxnet/blob/master/benchmark/image-classification/benchmark_resnet.py)
+ will automatically download the dataset, you need to provide dataset name, resnet version 
+(1 or 2), number of layers (20, 56, or 110), number of GPUs to use. 
+
+Example Usage:
+
+`python benchmark_resnet.py --dataset cifar10 --version 1 --layers 56 --gpus 4`
+
+
 ### ImageNet Dataset
 First, download ImageNet Dataset from [here](http://image-net.org/download), there are total 1.4 million images 
-with 1000 classes, each class is in a subfolder. In this script, each image is processed to size 256*256
+with 1000 classes, each class is in a subfolder. In this script, each image is processed to size 256x256
 
 Since ImageNet Dataset is too large, there are two training mode for data that does not fit into memory: 
-`train_on_batch` and `fit_generator`, we recommend train_on_batch since it's more efficient on multi_gpu.
+[`train_on_batch`](https://keras.io/models/sequential/#train_on_batch) and 
+[`fit_generator`](https://keras.io/models/sequential/#fit_generator), 
+we recommend train_on_batch since it's more efficient on multi_gpu.
 (Refer to [Keras Document](https://keras.io/getting-started/faq/#how-can-i-use-keras-with-datasets-that-dont-fit-in-memory) 
 and Keras Issue [#9502](https://github.com/keras-team/keras/issues/9502), 
 [#9204](https://github.com/keras-team/keras/issues/9204), [#9647](https://github.com/keras-team/keras/issues/9647))
 
-Need to provide training mode, number of gpus and path to imagenet dataset.
+Compare to CIFAR-10, you need to provide additional params: training mode and path to imagenet dataset.
 
 Example usage:
 
-`python benchmark_imagenet_resnet.py --train_mode train_on_batch --gpus 4 --data_path home/ubuntu/imagenet/train/`
+`python benchmark_resnet.py --dataset imagenet --mxnet_backend_training_speed.pngversion 1 -layers 56 --gpus 4 --train_mode train_on_batch --data_path home/ubuntu/imagenet/train/`
 
 ### Synthetic Dataset
 We used benchmark scripts from 
@@ -41,16 +55,30 @@ you want to benchmark inference speed (True or False).
 Example Usage:
 
 `sh run_<backend-type>_backend.sh gpu_config False`
+
 ### CNN Benchmark Results
-Here we list the result on ImageNet and Synthetic Data(channels first) using ResNet50V1 model, on 1, 4 GPUs using 
-AWS p3.8xLarge instance and 8 GPUs using AWS p3.16xLarge instance. For more details about the instance configuration, 
-please refer [here](https://aws.amazon.com/ec2/instance-types/p3/)
+Here we list the result of MXNet backend training speed on CIFAR-10, ImageNet and Synthetic Data using 
+ResNet50V1 model, on CPU, 1, 4, 8 GPUs using AWS instances. 
+Hardware specifications of the instances can be found [here](https://aws.amazon.com/ec2/instance-types/)
+
+For more detailed benchmark results, please refer to [CNN results](https://github.com/awslabs/keras-apache-mxnet/tree/keras2_mxnet_backend/benchmark/benchmark_result/CNN_result.md). 
 
-| GPUs   | ImageNet  | Synthetic Data(Channels First) |
-|--------|:---------:|-------------------------------:|
-| 1      | 162       |   229                          |
-| 4      | 538       |   727                          |
-| 8      | 728       |   963                          |
+|||
+|  ------ | ------ |
+|  Keras Version | 2.1.5 |
+|  MXNet Version | 1.1.0 |
+|  Data Format | Channel first |
+
+|  Instance | GPU used | Package | CIFAR-10 | ImageNet | Synthetic Data |
+|  ------ | ------ | ------ | ------ | ------ | ------ |
+|  C5.18xLarge | 0  | mxnet-mkl | 87 | N/A | 9 |
+|  P3.8xLarge | 1 | mxnet-cu90 | N/A | 165 | 229 |
+|  P3.8xLarge | 4 | mxnet-cu90 | 1792 | 538 | 728 |
+|  P3.16xLarge | 8 | mxnet-cu90 | 1618 | 728 | 963 |
+
+![MXNet backend training speed](https://github.com/roywei/keras/blob/benchmark_result/benchmark/benchmark_result/mxnet_backend_training_speed.png)
+
+Note: X-axis is number of GPUs used, Y-axis is training speed(images/second)
 
 ## RNN Benchmarks
 
@@ -76,8 +104,6 @@ We have used an official WikiText-2 character level Dataset from this [link](htt
 
 The `lstm_text_generation_wikitext2.py` includes a dataset that is hosted on S3 bucket from this [link](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip) (This is a WikiText-2 raw character level data).
 
-### 
-
 ### RNN Benchmark Results
 
 Here, we list the result on Synthetic, Nietzsche, and WikiText-2 dataset using Sequential model(LSTM) on Amazon AWS C5.xLarge(CPU) instance and P3.8xLarge(1, 4 GPUs) with MXNet backend. Batch size is 128. For more details about the instance configuration, please refer [P3](https://aws.amazon.com/ec2/instance-types/p3/) and [C5](https://aws.amazon.com/ec2/instance-types/c5/).
@@ -94,10 +120,12 @@ Here, we list the result on Synthetic, Nietzsche, and WikiText-2 dataset using S
 | P3.8xLarge | 1    | WikiText-2 | 882 sec - 264us/step                |
 | P3.8xLarge | 4    | WikiText-2 | 794 sec - 235us/step                |
 
-## 
-##Credits
-Synthetic Data scripts modified from [
-TensorFlow Benchmarks](https://github.com/tensorflow/benchmarks/tree/keras-benchmarks)
+
+
+## Credits
+
+Synthetic Data scripts modified from 
+[TensorFlow Benchmarks](https://github.com/tensorflow/benchmarks/tree/keras-benchmarks)
 
 ## Reference
 [1] [TensorFlow Benchmarks](https://github.com/tensorflow/benchmarks/tree/keras-benchmarks)
diff --git a/...k/image-classification/models/__init__.py → benchmark/__init__.py b/...k/image-classification/models/__init__.py → benchmark/__init__.py
diff --git a/benchmark/benchmark_result/CNN_result.md b/benchmark/benchmark_result/CNN_result.md
@@ -0,0 +1,92 @@
+# Detailed CNN Benchmark Results
+## CIFAR-10 Dataset
+### Configauration
+|||
+|---|---|
+|  Data Set | [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) |
+|  Keras Version | 2.1.5 |
+| TensorFlow Version | 1.7.0 |
+| MXNet Version | 1.1.0 |
+|  Training Method | [`fit`](https://keras.io/models/model/#fit) |
+|  Training Scripts | [Simple CNN Script](https://github.com/awslabs/keras-apache-mxnet/blob/master/examples/CIFAR-10_cnn.py), [ResNet Script](https://github.com/awslabs/keras-apache-mxnet/blob/master/benchmark/image-classification/benchmark_resnet.py) |
+
+### Results
+
+|  Instance Type | GPU used | Model | Backend | Package | Batch Size | Data Format | Speed (images/s) |
+|  ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
+|  C5.xLarge | 0  | Simple CNN | MXNet | mxnet-mkl | 32 | channel last | 253 |
+|  C5.xLarge | 0 | Simple CNN | MXNet | mxnet-mkl | 32 | channel first | 223 |
+|  C5.xLarge | 0 | Simple CNN | TensorFlow | tensorflow | 32 | channel last | 309 |
+|  C5.xLarge | 0 | Simple CNN | TensorFlow | tensorflow | 32 | channel first | 101 |
+|  C5.18xLarge | 0 | Simple CNN | MXNet | mxnet-mkl | 32 | channel last | 845 |
+|  C5.18xLarge | 0 | Simple CNN | MXNet | mxnet-mkl | 32 | channel first | 936 |
+|  C5.18xLarge | 0 | ReNet50V1 | TensorFlow | tensorflow | 32 | channel last | 59 |
+|  C5.18xLarge | 0 | ReNet50V1 | TensorFlow | tensorflow | 32 | channel first | 41 |
+|  C5.18xLarge | 0 | ReNet50V1 | MXNet | mxnet-mkl |32 | channel last | 48 |
+|  C5.18xLarge | 0 | ReNet50V1 | MXNet | mxnet-mkl | 32 | channel first | 87 |
+|  P3.8xLarge | 4 | ReNet50V1 | TensorFlow | tensorflow-gpu |128 | channel last | 1020 |
+|  P3.8xLarge | 4 | ReNet50V1 | MXNet | mxnet-cu90 | 128 | channel first | 1792 |
+|  P3.8xLarge | 8 | ReNet50V1 | TensorFlow | tensorflow-gpu |256 | channel last | 962 |
+|  P3.16xLarge | 8 | ReNet50V1 | MXNet | mxnet-cu90 | 256 | channel first | 1618 |
+
+## ImageNet Dataset
+
+### Configuration
+|||
+|---|---|
+|  Data Set | [ImageNet](http://image-net.org) |
+| Model | ResNet50V1|
+|  Keras Version | 2.1.3 |
+| TensorFlow Version | 1.6.0rc1 |
+| MXNet Version | 1.1.0 |
+|  Training Method | [`train_on_batch`](https://keras.io/models/sequential/#train_on_batch), [`fit_generator`](https://keras.io/models/sequential/#fit_generator) |
+|  Training Scripts | [ResNet Script](https://github.com/awslabs/keras-apache-mxnet/blob/master/benchmark/image-classification/benchmark_resnet.py) |
+
+### Results
+
+|  Instance | GPU used | Backend | Package | Method | Batch Size | Data Format | Speed (images/s) |
+|  ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
+|  P3.8xLarge | 1 |  TensorFlow | tensorflow-gpu | `train_on_batch` | 32 | channel last | 50 |
+|  P3.8xLarge | 1 |  MXNet | mxnet-cu90 | `train_on_batch` | 32 | channel first | 165 |
+|  P3.8xLarge | 4 |  TensorFlow | tensorflow-gpu | `train_on_batch` | 128 | channel last | 162 |
+|  P3.8xLarge | 4 |  MXNet | mxnet-cu90 | `train_on_batch` | 128 | channel first | 538 |
+|  P3.16xLarge | 8 |  TensorFlow | tensorflow-gpu | `train_on_batch` | 256 | channel last | 212 |
+|  P3.16xLarge | 8 |  MXNet | mxnet-cu90 | `train_on_batch` | 256 | channel first | 728 |
+|  P3.8xLarge | 1 | TensorFlow | tensorflow-gpu | `fit_generator` | 32 | channel last | 53 |
+|  P3.8xLarge | 1 |  MXNet | mxnet-cu90 | `fit_generator` | 32 | channel first | 73 |
+|  P3.8xLarge | 4 |  TensorFlow | tensorflow-gpu | `fit_generator` | 128 | channel last | 173 |
+|  P3.8xLarge | 4 |  MXNet | mxnet-cu90 | `fit_generator` | 128 | channel first | 197  |
+
+## Synthetic Dataset
+
+### Configuration
+|||
+|---|---|
+|  Data Set | Random 256x256 color images, 1000 classes |
+| Model | ResNet50V1|
+|  Keras Version | 2.1.3 |
+| TensorFlow Version | 1.6.0rc1 |
+| MXNet Version | 1.1.0 |
+|  Training Method |[`fit`](https://keras.io/models/model/#fit) |
+|  Training Scripts | [ResNet Script](https://github.com/awslabs/keras-apache-mxnet/tree/keras2_mxnet_backend/benchmark/synthetic) |
+
+### Results
+
+|  Instance | GPU used | Backend | Package | Batch Size | Data Format | Speed (images/s) |
+|  ------ | ------ | ------ | ------ | ------ | ------ | ------ |
+|  C5.18xLarge | 0 |	TensorFlow|	tensorflow |32| channel first |4|
+|  C5.18xLarge |	0 |	MXNet	| mxnet-mkl	| 32 |	channel first|	9|
+|  P3.8xLarge | 1 | TensorFlow | tensorflow-gpu | 32 | channel first | 198|
+|  P3.8xLarge | 1 | MXNet | mxnet-cu90 | 32 | channel first | 229 |
+|  P3.8xLarge | 4 | TensorFlow | tensorflow-gpu | 128 | channel first | 448 |
+|  P3.8xLarge | 4 | MXNet | mxnet-cu90 | 128 | channel first | 728 |
+|  P3.16xLarge | 8 | TensorFlow | tensorflow-gpu | 256 | channel first | 346 |
+|  P3.16xLarge | 8 | MXNet | mxnet-cu90 | 256 | channel first | 963 |
+|  C5.18xLarge | 0 |	TensorFlow|	tensorflow |32| channel last | 4 |
+|  C5.18xLarge | 0 |	MXNet	| mxnet-mkl	| 32 |	channel last | 3 |
+|  P3.8xLarge | 1 | TensorFlow | tensorflow-gpu | 32 | channel last | 164|
+|  P3.8xLarge | 1 | MXNet | mxnet-cu90 | 32 | channel last | 18 |
+|  P3.8xLarge | 4 | TensorFlow | tensorflow-gpu | 128 | channel last | 409 |
+|  P3.8xLarge | 4 | MXNet | mxnet-cu90 | 128 | channel last | 73 |
+|  P3.16xLarge | 8 | TensorFlow | tensorflow-gpu | 256 | channel last | 164 |
+|  P3.16xLarge | 8 | MXNet | mxnet-cu90 | 256 | channel last | 18 |
diff --git a/benchmark/benchmark_result/mxnet_backend_training_speed.png b/benchmark/benchmark_result/mxnet_backend_training_speed.png
diff --git a/benchmark/synthetic/models/__init__.py → benchmark/scripts/__init__.py b/benchmark/synthetic/models/__init__.py → benchmark/scripts/__init__.py
diff --git a/.../image-classification/benchmark_resnet.py → benchmark/scripts/benchmark_resnet.py b/.../image-classification/benchmark_resnet.py → benchmark/scripts/benchmark_resnet.py
@@ -35,11 +35,11 @@
 from keras.utils import multi_gpu_model
 
 parser = argparse.ArgumentParser()
-parser.add_argument('--data_set',
+parser.add_argument('--dataset',
                     help='Dataset for training: cifar10 or imagenet')
 parser.add_argument('--version',
                     help='Provide resnet version: 1 or 2')
-parser.add_argument('--num_layer',
+parser.add_argument('--layers',
                     help="Provide number of layers: 20, 56 or 110")
 parser.add_argument('--gpus',
                     help='Number of GPUs to use')
@@ -51,19 +51,19 @@
 args = parser.parse_args()
 
 # Check args
-if args.data_set not in ["cifar10", "imagenet"]:
+if args.dataset not in ["cifar10", "imagenet"]:
     print("Only support cifar10 or imagenet data set")
     sys.exit()
 
 if args.version not in ["1", "2"]:
     print("Provide resnet version: 1 or 2")
     sys.exit()
 
-if args.num_layer not in ["20", "56", "110"]:
+if args.layers not in ["20", "56", "110"]:
     print("Provide number of layers: 20, 56 or 110")
     sys.exit()
 
-if args.data_set == "imagenet":
+if args.dataset == "imagenet":
     if not args.train_mode or not args.data_path:
         print("Need to provide training mode(train_on_batch or fit_generator) "
               "and data path to imagenet dataset")
@@ -81,7 +81,7 @@
 # Training parameters
 batch_size = 32 * num_gpus if num_gpus > 0 else 32
 epochs = 200
-num_classes = 1000 if args.data_set == "imagenet" else 10
+num_classes = 1000 if args.dataset == "imagenet" else 10
 data_format = K._image_data_format
 print('using image format:', data_format)
 # Subtracting pixel mean improves accuracy
@@ -93,7 +93,7 @@
 
 # Prepare Training Data
 # CIFAR10 data set
-if args.data_set == "cifar10":
+if args.dataset == "cifar10":
     # Load the CIFAR10 data.
     (x_train, y_train), (x_test, y_test) = cifar10.load_data()
 
@@ -119,7 +119,7 @@
     y_test = keras.utils.to_categorical(y_test, num_classes)
 
 # ImageNet Dataset
-if args.data_set == "imagenet":
+if args.dataset == "imagenet":
     input_shape = (256, 256, 3) if data_format == 'channels_last' else (3, 256, 256)
     if args.train_mode == 'fit_generator':
         train_datagen = ImageDataGenerator(
@@ -201,7 +201,7 @@ def get_batch():
 version = int(args.version)
 
 # Computed depth from supplied model parameter n
-depth = int(args.num_layer)
+depth = int(args.layers)
 
 # Model name, depth and version
 model_type = 'ResNet%dv%d' % (depth, version)
@@ -285,7 +285,7 @@ def lr_schedule(epoch):
 callbacks = [checkpoint, lr_reducer, lr_scheduler]
 
 # Run training, without data augmentation.
-if args.data_set == "imagenet":
+if args.dataset == "imagenet":
     print('Not using data augmentation.')
     if args.train_mode == 'train_on_batch':
         for i in range(0, epochs):

diff --git a/benchmark/synthetic/config.json → benchmark/scripts/config.json b/benchmark/synthetic/config.json → benchmark/scripts/config.json
diff --git a/benchmark/synthetic/data_generator.py → benchmark/scripts/data_generator.py b/benchmark/synthetic/data_generator.py → benchmark/scripts/data_generator.py
diff --git a/benchmark/scripts/models/__init__.py b/benchmark/scripts/models/__init__.py
diff --git a/benchmark/synthetic/models/dataset_utils.py → benchmark/scripts/models/dataset_utils.py b/benchmark/synthetic/models/dataset_utils.py → benchmark/scripts/models/dataset_utils.py
diff --git a/benchmark/synthetic/models/lstm_synthetic.py → benchmark/scripts/models/lstm_synthetic.py b/benchmark/synthetic/models/lstm_synthetic.py → benchmark/scripts/models/lstm_synthetic.py
diff --git a/.../synthetic/models/lstm_text_generation.py → ...rk/scripts/models/lstm_text_generation.py b/.../synthetic/models/lstm_text_generation.py → ...rk/scripts/models/lstm_text_generation.py
diff --git a/benchmark/synthetic/models/model_config.py → benchmark/scripts/models/model_config.py b/benchmark/synthetic/models/model_config.py → benchmark/scripts/models/model_config.py
diff --git a/...ark/image-classification/models/resnet.py → benchmark/scripts/models/resnet.py b/...ark/image-classification/models/resnet.py → benchmark/scripts/models/resnet.py
diff --git a/...rk/synthetic/models/resnet50_benchmark.py → ...mark/scripts/models/resnet50_benchmark.py b/...rk/synthetic/models/resnet50_benchmark.py → ...mark/scripts/models/resnet50_benchmark.py
diff --git a/...tic/models/resnet50_benchmark_tf_keras.py → ...pts/models/resnet50_benchmark_tf_keras.py b/...tic/models/resnet50_benchmark_tf_keras.py → ...pts/models/resnet50_benchmark_tf_keras.py
diff --git a/benchmark/synthetic/models/timehistory.py → benchmark/scripts/models/timehistory.py b/benchmark/synthetic/models/timehistory.py → benchmark/scripts/models/timehistory.py
diff --git a/benchmark/synthetic/run_benchmark.py → benchmark/scripts/run_benchmark.py b/benchmark/synthetic/run_benchmark.py → benchmark/scripts/run_benchmark.py
@@ -12,10 +12,6 @@
 
 if keras.backend.backend() == "tensorflow":
     import tensorflow as tf
-if keras.backend.backend() == "theano":
-    import theano
-if keras.backend.backend() == "cntk":
-    import cntk
 if keras.backend.backend() == "mxnet":
     import mxnet
 
@@ -54,10 +50,6 @@
 def get_backend_version():
     if keras.backend.backend() == "tensorflow":
         return tf.__version__
-    if keras.backend.backend() == "theano":
-        return theano.__version__
-    if keras.backend.backend() == "cntk":
-        return cntk.__version__
     if keras.backend.backend() == "mxnet":
         return mxnet.__version__
     return "undefined"

diff --git a/benchmark/synthetic/run_mxnet_backend.sh → benchmark/scripts/run_mxnet_backend.sh b/benchmark/synthetic/run_mxnet_backend.sh → benchmark/scripts/run_mxnet_backend.sh
diff --git a/benchmark/synthetic/run_tf_backend.sh → benchmark/scripts/run_tf_backend.sh b/benchmark/synthetic/run_tf_backend.sh → benchmark/scripts/run_tf_backend.sh
diff --git a/benchmark/synthetic/run_tf_keras_backend.sh → benchmark/scripts/run_tf_keras_backend.sh b/benchmark/synthetic/run_tf_keras_backend.sh → benchmark/scripts/run_tf_keras_backend.sh