Skip to content
This repository has been archived by the owner on Jul 1, 2024. It is now read-only.

Update benchmark result #70

Merged
merged 6 commits into from
Apr 20, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 47 additions & 19 deletions benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,22 +13,36 @@ To switch between different backends refer to
[configure Keras backend](https://github.com/awslabs/keras-apache-mxnet/wiki/Installation#2-configure-keras-backend)

## CNN Benchmarks
We provide benchmark scripts to run on CIFAR10, ImageNet and Synthetic Dataset(randomly generated)
We provide benchmark scripts to run on CIFAR-10, ImageNet and Synthetic Dataset(randomly generated)

### CIFAR-10 Dataset
[CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset has 60000 32x32 color images in 10 classes.
The [training scripts](https://github.com/awslabs/keras-apache-mxnet/blob/master/benchmark/image-classification/benchmark_resnet.py)
will automatically download the dataset, you need to provide dataset name, resnet version
(1 or 2), number of layers (20, 56, or 110), number of GPUs to use.

Example Usage:

`python benchmark_resnet.py --dataset cifar10 --version 1 --layers 56 --gpus 4`


### ImageNet Dataset
First, download ImageNet Dataset from [here](http://image-net.org/download), there are total 1.4 million images
with 1000 classes, each class is in a subfolder. In this script, each image is processed to size 256*256
with 1000 classes, each class is in a subfolder. In this script, each image is processed to size 256x256

Since ImageNet Dataset is too large, there are two training mode for data that does not fit into memory:
`train_on_batch` and `fit_generator`, we recommend train_on_batch since it's more efficient on multi_gpu.
[`train_on_batch`](https://keras.io/models/sequential/#train_on_batch) and
[`fit_generator`](https://keras.io/models/sequential/#fit_generator),
we recommend train_on_batch since it's more efficient on multi_gpu.
(Refer to [Keras Document](https://keras.io/getting-started/faq/#how-can-i-use-keras-with-datasets-that-dont-fit-in-memory)
and Keras Issue [#9502](https://github.com/keras-team/keras/issues/9502),
[#9204](https://github.com/keras-team/keras/issues/9204), [#9647](https://github.com/keras-team/keras/issues/9647))

Need to provide training mode, number of gpus and path to imagenet dataset.
Compare to CIFAR-10, you need to provide additional params: training mode and path to imagenet dataset.

Example usage:

`python benchmark_imagenet_resnet.py --train_mode train_on_batch --gpus 4 --data_path home/ubuntu/imagenet/train/`
`python benchmark_resnet.py --dataset imagenet --mxnet_backend_training_speed.pngversion 1 -layers 56 --gpus 4 --train_mode train_on_batch --data_path home/ubuntu/imagenet/train/`

### Synthetic Dataset
We used benchmark scripts from
Expand All @@ -41,16 +55,30 @@ you want to benchmark inference speed (True or False).
Example Usage:

`sh run_<backend-type>_backend.sh gpu_config False`

### CNN Benchmark Results
Here we list the result on ImageNet and Synthetic Data(channels first) using ResNet50V1 model, on 1, 4 GPUs using
AWS p3.8xLarge instance and 8 GPUs using AWS p3.16xLarge instance. For more details about the instance configuration,
please refer [here](https://aws.amazon.com/ec2/instance-types/p3/)
Here we list the result of MXNet backend training speed on CIFAR-10, ImageNet and Synthetic Data using
ResNet50V1 model, on CPU, 1, 4, 8 GPUs using AWS instances.
Hardware specifications of the instances can be found [here](https://aws.amazon.com/ec2/instance-types/)

For more detailed benchmark results, please refer to [CNN results](https://github.com/awslabs/keras-apache-mxnet/tree/keras2_mxnet_backend/benchmark/benchmark_result/CNN_result.md).

| GPUs | ImageNet | Synthetic Data(Channels First) |
|--------|:---------:|-------------------------------:|
| 1 | 162 | 229 |
| 4 | 538 | 727 |
| 8 | 728 | 963 |
|||
| ------ | ------ |
| Keras Version | 2.1.5 |
| MXNet Version | 1.1.0 |
| Data Format | Channel first |

| Instance | GPU used | Package | CIFAR-10 | ImageNet | Synthetic Data |
| ------ | ------ | ------ | ------ | ------ | ------ |
| C5.18xLarge | 0 | mxnet-mkl | 87 | N/A | 9 |
| P3.8xLarge | 1 | mxnet-cu90 | N/A | 165 | 229 |
| P3.8xLarge | 4 | mxnet-cu90 | 1792 | 538 | 728 |
| P3.16xLarge | 8 | mxnet-cu90 | 1618 | 728 | 963 |

![MXNet backend training speed](https://github.com/roywei/keras/blob/benchmark_result/benchmark/benchmark_result/mxnet_backend_training_speed.png)

Note: X-axis is number of GPUs used, Y-axis is training speed(images/second)

## RNN Benchmarks

Expand All @@ -76,8 +104,6 @@ We have used an official WikiText-2 character level Dataset from this [link](htt

The `lstm_text_generation_wikitext2.py` includes a dataset that is hosted on S3 bucket from this [link](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip) (This is a WikiText-2 raw character level data).

###

### RNN Benchmark Results

Here, we list the result on Synthetic, Nietzsche, and WikiText-2 dataset using Sequential model(LSTM) on Amazon AWS C5.xLarge(CPU) instance and P3.8xLarge(1, 4 GPUs) with MXNet backend. Batch size is 128. For more details about the instance configuration, please refer [P3](https://aws.amazon.com/ec2/instance-types/p3/) and [C5](https://aws.amazon.com/ec2/instance-types/c5/).
Expand All @@ -94,10 +120,12 @@ Here, we list the result on Synthetic, Nietzsche, and WikiText-2 dataset using S
| P3.8xLarge | 1 | WikiText-2 | 882 sec - 264us/step |
| P3.8xLarge | 4 | WikiText-2 | 794 sec - 235us/step |

##
##Credits
Synthetic Data scripts modified from [
TensorFlow Benchmarks](https://github.com/tensorflow/benchmarks/tree/keras-benchmarks)


## Credits

Synthetic Data scripts modified from
[TensorFlow Benchmarks](https://github.com/tensorflow/benchmarks/tree/keras-benchmarks)

## Reference
[1] [TensorFlow Benchmarks](https://github.com/tensorflow/benchmarks/tree/keras-benchmarks)
92 changes: 92 additions & 0 deletions benchmark/benchmark_result/CNN_result.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Detailed CNN Benchmark Results
## CIFAR-10 Dataset
### Configauration
|||
|---|---|
| Data Set | [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) |
| Keras Version | 2.1.5 |
| TensorFlow Version | 1.7.0 |
| MXNet Version | 1.1.0 |
| Training Method | [`fit`](https://keras.io/models/model/#fit) |
| Training Scripts | [Simple CNN Script](https://github.com/awslabs/keras-apache-mxnet/blob/master/examples/CIFAR-10_cnn.py), [ResNet Script](https://github.com/awslabs/keras-apache-mxnet/blob/master/benchmark/image-classification/benchmark_resnet.py) |

### Results

| Instance Type | GPU used | Model | Backend | Package | Batch Size | Data Format | Speed (images/s) |
| ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
| C5.xLarge | 0 | Simple CNN | MXNet | mxnet-mkl | 32 | channel last | 253 |
| C5.xLarge | 0 | Simple CNN | MXNet | mxnet-mkl | 32 | channel first | 223 |
| C5.xLarge | 0 | Simple CNN | TensorFlow | tensorflow | 32 | channel last | 309 |
| C5.xLarge | 0 | Simple CNN | TensorFlow | tensorflow | 32 | channel first | 101 |
| C5.18xLarge | 0 | Simple CNN | MXNet | mxnet-mkl | 32 | channel last | 845 |
| C5.18xLarge | 0 | Simple CNN | MXNet | mxnet-mkl | 32 | channel first | 936 |
| C5.18xLarge | 0 | ReNet50V1 | TensorFlow | tensorflow | 32 | channel last | 59 |
| C5.18xLarge | 0 | ReNet50V1 | TensorFlow | tensorflow | 32 | channel first | 41 |
| C5.18xLarge | 0 | ReNet50V1 | MXNet | mxnet-mkl |32 | channel last | 48 |
| C5.18xLarge | 0 | ReNet50V1 | MXNet | mxnet-mkl | 32 | channel first | 87 |
| P3.8xLarge | 4 | ReNet50V1 | TensorFlow | tensorflow-gpu |128 | channel last | 1020 |
| P3.8xLarge | 4 | ReNet50V1 | MXNet | mxnet-cu90 | 128 | channel first | 1792 |
| P3.8xLarge | 8 | ReNet50V1 | TensorFlow | tensorflow-gpu |256 | channel last | 962 |
| P3.16xLarge | 8 | ReNet50V1 | MXNet | mxnet-cu90 | 256 | channel first | 1618 |

## ImageNet Dataset

### Configuration
|||
|---|---|
| Data Set | [ImageNet](http://image-net.org) |
| Model | ResNet50V1|
| Keras Version | 2.1.3 |
| TensorFlow Version | 1.6.0rc1 |
| MXNet Version | 1.1.0 |
| Training Method | [`train_on_batch`](https://keras.io/models/sequential/#train_on_batch), [`fit_generator`](https://keras.io/models/sequential/#fit_generator) |
| Training Scripts | [ResNet Script](https://github.com/awslabs/keras-apache-mxnet/blob/master/benchmark/image-classification/benchmark_resnet.py) |

### Results

| Instance | GPU used | Backend | Package | Method | Batch Size | Data Format | Speed (images/s) |
| ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
| P3.8xLarge | 1 | TensorFlow | tensorflow-gpu | `train_on_batch` | 32 | channel last | 50 |
| P3.8xLarge | 1 | MXNet | mxnet-cu90 | `train_on_batch` | 32 | channel first | 165 |
| P3.8xLarge | 4 | TensorFlow | tensorflow-gpu | `train_on_batch` | 128 | channel last | 162 |
| P3.8xLarge | 4 | MXNet | mxnet-cu90 | `train_on_batch` | 128 | channel first | 538 |
| P3.16xLarge | 8 | TensorFlow | tensorflow-gpu | `train_on_batch` | 256 | channel last | 212 |
| P3.16xLarge | 8 | MXNet | mxnet-cu90 | `train_on_batch` | 256 | channel first | 728 |
| P3.8xLarge | 1 | TensorFlow | tensorflow-gpu | `fit_generator` | 32 | channel last | 53 |
| P3.8xLarge | 1 | MXNet | mxnet-cu90 | `fit_generator` | 32 | channel first | 73 |
| P3.8xLarge | 4 | TensorFlow | tensorflow-gpu | `fit_generator` | 128 | channel last | 173 |
| P3.8xLarge | 4 | MXNet | mxnet-cu90 | `fit_generator` | 128 | channel first | 197 |

## Synthetic Dataset

### Configuration
|||
|---|---|
| Data Set | Random 256x256 color images, 1000 classes |
| Model | ResNet50V1|
| Keras Version | 2.1.3 |
| TensorFlow Version | 1.6.0rc1 |
| MXNet Version | 1.1.0 |
| Training Method |[`fit`](https://keras.io/models/model/#fit) |
| Training Scripts | [ResNet Script](https://github.com/awslabs/keras-apache-mxnet/tree/keras2_mxnet_backend/benchmark/synthetic) |

### Results

| Instance | GPU used | Backend | Package | Batch Size | Data Format | Speed (images/s) |
| ------ | ------ | ------ | ------ | ------ | ------ | ------ |
| C5.18xLarge | 0 | TensorFlow| tensorflow |32| channel first |4|
| C5.18xLarge | 0 | MXNet | mxnet-mkl | 32 | channel first| 9|
| P3.8xLarge | 1 | TensorFlow | tensorflow-gpu | 32 | channel first | 198|
| P3.8xLarge | 1 | MXNet | mxnet-cu90 | 32 | channel first | 229 |
| P3.8xLarge | 4 | TensorFlow | tensorflow-gpu | 128 | channel first | 448 |
| P3.8xLarge | 4 | MXNet | mxnet-cu90 | 128 | channel first | 728 |
| P3.16xLarge | 8 | TensorFlow | tensorflow-gpu | 256 | channel first | 346 |
| P3.16xLarge | 8 | MXNet | mxnet-cu90 | 256 | channel first | 963 |
| C5.18xLarge | 0 | TensorFlow| tensorflow |32| channel last | 4 |
| C5.18xLarge | 0 | MXNet | mxnet-mkl | 32 | channel last | 3 |
| P3.8xLarge | 1 | TensorFlow | tensorflow-gpu | 32 | channel last | 164|
| P3.8xLarge | 1 | MXNet | mxnet-cu90 | 32 | channel last | 18 |
| P3.8xLarge | 4 | TensorFlow | tensorflow-gpu | 128 | channel last | 409 |
| P3.8xLarge | 4 | MXNet | mxnet-cu90 | 128 | channel last | 73 |
| P3.16xLarge | 8 | TensorFlow | tensorflow-gpu | 256 | channel last | 164 |
| P3.16xLarge | 8 | MXNet | mxnet-cu90 | 256 | channel last | 18 |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@
from keras.utils import multi_gpu_model

parser = argparse.ArgumentParser()
parser.add_argument('--data_set',
parser.add_argument('--dataset',
help='Dataset for training: cifar10 or imagenet')
parser.add_argument('--version',
help='Provide resnet version: 1 or 2')
parser.add_argument('--num_layer',
parser.add_argument('--layers',
help="Provide number of layers: 20, 56 or 110")
parser.add_argument('--gpus',
help='Number of GPUs to use')
Expand All @@ -51,19 +51,19 @@
args = parser.parse_args()

# Check args
if args.data_set not in ["cifar10", "imagenet"]:
if args.dataset not in ["cifar10", "imagenet"]:
print("Only support cifar10 or imagenet data set")
sys.exit()

if args.version not in ["1", "2"]:
print("Provide resnet version: 1 or 2")
sys.exit()

if args.num_layer not in ["20", "56", "110"]:
if args.layers not in ["20", "56", "110"]:
print("Provide number of layers: 20, 56 or 110")
sys.exit()

if args.data_set == "imagenet":
if args.dataset == "imagenet":
if not args.train_mode or not args.data_path:
print("Need to provide training mode(train_on_batch or fit_generator) "
"and data path to imagenet dataset")
Expand All @@ -81,7 +81,7 @@
# Training parameters
batch_size = 32 * num_gpus if num_gpus > 0 else 32
epochs = 200
num_classes = 1000 if args.data_set == "imagenet" else 10
num_classes = 1000 if args.dataset == "imagenet" else 10
data_format = K._image_data_format
print('using image format:', data_format)
# Subtracting pixel mean improves accuracy
Expand All @@ -93,7 +93,7 @@

# Prepare Training Data
# CIFAR10 data set
if args.data_set == "cifar10":
if args.dataset == "cifar10":
# Load the CIFAR10 data.
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

Expand All @@ -119,7 +119,7 @@
y_test = keras.utils.to_categorical(y_test, num_classes)

# ImageNet Dataset
if args.data_set == "imagenet":
if args.dataset == "imagenet":
input_shape = (256, 256, 3) if data_format == 'channels_last' else (3, 256, 256)
if args.train_mode == 'fit_generator':
train_datagen = ImageDataGenerator(
Expand Down Expand Up @@ -201,7 +201,7 @@ def get_batch():
version = int(args.version)

# Computed depth from supplied model parameter n
depth = int(args.num_layer)
depth = int(args.layers)

# Model name, depth and version
model_type = 'ResNet%dv%d' % (depth, version)
Expand Down Expand Up @@ -285,7 +285,7 @@ def lr_schedule(epoch):
callbacks = [checkpoint, lr_reducer, lr_scheduler]

# Run training, without data augmentation.
if args.data_set == "imagenet":
if args.dataset == "imagenet":
print('Not using data augmentation.')
if args.train_mode == 'train_on_batch':
for i in range(0, epochs):
Expand Down
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,6 @@

if keras.backend.backend() == "tensorflow":
import tensorflow as tf
if keras.backend.backend() == "theano":
import theano
if keras.backend.backend() == "cntk":
import cntk
if keras.backend.backend() == "mxnet":
import mxnet

Expand Down Expand Up @@ -54,10 +50,6 @@
def get_backend_version():
if keras.backend.backend() == "tensorflow":
return tf.__version__
if keras.backend.backend() == "theano":
return theano.__version__
if keras.backend.backend() == "cntk":
return cntk.__version__
if keras.backend.backend() == "mxnet":
return mxnet.__version__
return "undefined"
Expand Down