Added detailed RNN results (#73)

* Added detailed RNN results * Modified table content and added CUDA version
awslabs · Jun 15, 2018 · ef425e4 · ef425e4
1 parent aa5857b
commit ef425e4
Show file tree

Hide file tree

Showing 3 changed files with 29 additions and 82 deletions.
diff --git a/benchmark/README.md b/benchmark/README.md
@@ -102,26 +102,42 @@ We have used an official Keras LSTM example scripts [lstm_text_generation.py](ht
 
 We have used an official WikiText-2 character level Dataset from this [link](https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset).
 
-The `lstm_text_generation_wikitext2.py` includes a dataset that is hosted on S3 bucket from this [link](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip) (This is a WikiText-2 raw character level data).
+The `lstm_text_generation.py` includes a dataset that is hosted on S3 bucket from this [link](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip) (This is a WikiText-2 raw character level data).
 
 ### RNN Benchmark Results
 
-Here, we list the result on Synthetic, Nietzsche, and WikiText-2 dataset using Sequential model(LSTM) on Amazon AWS C5.xLarge(CPU) instance and P3.8xLarge(1, 4 GPUs) with MXNet backend. Batch size is 128. For more details about the instance configuration, please refer [P3](https://aws.amazon.com/ec2/instance-types/p3/) and [C5](https://aws.amazon.com/ec2/instance-types/c5/).
+Here, we list the result on Synthetic, Nietzsche, and WikiText-2 dataset using Sequential model(LSTM) on Amazon AWS C5.18xLarge(CPU), C5.xLarge(CPU), and P3.8xLarge(1, 4 GPUs) instance with MXNet backend. Batch size is 128. For more details about the instance configuration, please refer [P3](https://aws.amazon.com/ec2/instance-types/p3/) and [C5](https://aws.amazon.com/ec2/instance-types/c5/).
 
-| Instance   | GPUs | Data Set   | Speed/Epoch <br />(Lower is better) |
-| ---------- | ---- | ---------- | ----------------------------------- |
-| C5.xLarge  | 0    | Synthetic  | 91 sec - 2ms/step                   |
-| P3.8xLarge | 1    | Synthetic  | 13 sec - 264us/step                 |
-| P3.8xLarge | 4    | Synthetic  | 12 sec - 241us/step                 |
-| C5.xLarge  | 0    | Nietzsche  | 352 sec -  2ms/step                 |
-| P3.8xLarge | 1    | Nietzsche  | 53 sec - 265us/step                 |
-| P3.8xLarge | 4    | Nietzsche  | 47 sec - 236us/step                 |
-| C5.xLarge  | 0    | WikiText-2 | 6410 sec - 2ms/step                 |
-| P3.8xLarge | 1    | WikiText-2 | 882 sec - 264us/step                |
-| P3.8xLarge | 4    | WikiText-2 | 794 sec - 235us/step                |
+For more detailed benchmark results, please refer to [RNN results.](benchmark_result/RNN_result.md)
 
+| Framework/Library | Version |
+| ----------------- | ------- |
+| Keras             | 2.1.5   |
+| MXNet             | 1.1.0   |
+| CUDA              | 9.0.176 |
 
 
+
+| Instance    | GPUs | Data Set   | Speed/Epoch (Lower is better) |
+| ----------- | ---- | ---------- | ----------------------------- |
+| C5.18xLarge | 0    | Synthetic  | 24s 485us/step                |
+| C5.xLarge   | 0    | Synthetic  | 93s 2ms/step                  |
+| P3.8xLarge  | 1    | Synthetic  | 13s 261us/step                |
+| P3.8xLarge  | 4    | Synthetic  | 12s 240us/step                |
+|             |      |            |                               |
+| C5.18xLarge | 0    | Nietzsche  | 78s 389us/step                |
+| C5.xLarge   | 0    | Nietzsche  | 360s 2ms/step                 |
+| P3.8xLarge  | 1    | Nietzsche  | 52s 262us/step                |
+| P3.8xLarge  | 4    | Nietzsche  | 47s 235us/step                |
+|             |      |            |                               |
+| C5.18xLarge | 0    | WikiText-2 | 1345s 398us/step              |
+| C5.xLarge   | 0    | WikiText-2 | 6417s 2ms/step                |
+| P3.8xLarge  | 1    | WikiText-2 | 868s 257us/step               |
+| P3.8xLarge  | 4    | WikiText-2 | 775s 229us/step               |
+
+
+![rnn_mxnet_dataset](benchmark_result/rnn_mxnet_dataset.png)
+
 ## Credits
 
 Synthetic Data scripts modified from 

diff --git a/benchmark/benchmark_result/RNN_result.md b/benchmark/benchmark_result/RNN_result.md
@@ -1,59 +1,5 @@
-# RNN Benchmark Results (Experimental support)
-
-## Summary
-```
-    NOTE:
-        RNN support in Keras-MXNet is experimental with few rough edges on CPU training performance and no support for 
-        variable length sequences. Below results are only early preview of the current status.
-```
-
-Please see [RNN with Keras-MXNet document](../docs/mxnet_backend/using_rnn_with_mxnet_backend.md) for more details on
- the poor CPU training performance and unsupported functionalities. 
-
- ### Configuration
-|                  |                                                              |
-| :--------------- | :----------------------------------------------------------- |
-| Keras            | v2.1.6                                                       |
-| TensorFlow       | v1.8.0                                                       |
-| MXNet            | v1.2.0                                                       |
-| CUDA             | v9.0.176                                                     |
-| cuDNN            | v7.0.1                                                       |
-
-### LSTM-Nietzsche
-
-| Instance Type | GPUs  | Batch Size  | Keras-MXNet (Time/Epoch), (GPU Mem)   | Keras-TensorFlow (Time/Epoch), (GPU Mem)   |
-|---|---|---|---|---|
-|  C5.18X Large | 0  | 128  | 78 sec, N/A | 55 sec, N/A|
-|  P3.8X Large |  1 |  128 | 52 sec, 792 MB | 83 sec, 15360 MB|
-|  P3.8X Large | 4  | 128  | 47 sec, 770 MB | 117 sec, 15410 MB |
-|  P3.16X Large | 8  | 128  | 72 sec, 826 MB | 183sec, 15408TBD |
-
-### LSTM-WikiText2
-
-| Instance Type | GPUs  | Batch Size  | Keras-MXNet (Time/Epoch), (GPU Mem)  | Keras-TensorFlow (Time/Epoch), (GPU Mem)  |
-|---|---|---|---|---|
-|  C5.18X Large | 0  | 128  | 1345 sec, N/A  | 875, N/A  |
-|  P3.8X Large |  1 |  128 | 868 sec, 772 MB | 817, 15360 MB  |
-|  P3.8X Large | 4  | 128  | 775 sec, 764 MB | 1468, 15410 MB  |
-|  P3.16X Large | 8  | 128  | 1214 sec, 826 MB | 3176 sec, 15410 MB |
-
-### Synthetic Data
-
-| Instance Type | GPUs  | Batch Size  | Keras-MXNet (Time/Epoch), (GPU Mem)   | Keras-TensorFlow (Time/Epoch), (GPU Mem)   |
-|---|---|---|---|---|
-|  C5.18X Large | 0  | 128  | 24 sec, N/A | 14 sec, N/A|
-|  P3.8X Large |  1 |  128 | 13 sec, 792 MB | 12 sec, 15360 MB|
-|  P3.8X Large | 4  | 128  | 12 sec, 770 MB | 21 sec, 15410 MB |
-|  P3.16X Large | 8  | 128  | 19 sec, 826 MB | 49 sec, 15360 MB |
-
-
 # Detailed RNN Benchmark Results
 
-Below is the result of GPU memory usage while running LSTM model on Synthetic, Nietzsche, and WikiText-2 character level dataset.
-
-![MemoryConsumption.png](MemoryConsumption.png)
-
-Note: All the data for performance diagram shown below is taken from the cell having `unroll Type=True`
 ## Synthetic Dataset
 
 ### Configuration
@@ -71,11 +17,6 @@ Note: All the data for performance diagram shown below is taken from the cell ha
 
 ### Results
 
-|                                                          |                                                              |
-| :------------------------------------------------------- | :----------------------------------------------------------- |
-| ![lstm_Synthetic_32.png](lstm_Synthetic_32.png)          | ![lstm_Synthetic_128.png](lstm_Synthetic_128.png)            |
-
-
 | Instance    | GPUs | Backend    | Batch size | Data Set  | Training  Method | Speed/Epoch (Lower is better) | Unroll Type | No. of samples | Memory(MiB) |
 | ----------- | ---- | ---------- | ---------- | --------- | ---------------- | ----------------------------- | ----------- | -------------- | ----------- |
 | C5.18xLarge | 0    | MXNet      | 32         | Synthetic | fit()            | 50s 1ms/step                  | TRUE        | 50000          | 0           |
@@ -124,11 +65,6 @@ Note: All the data for performance diagram shown below is taken from the cell ha
 
 ### Results
 
-|                                                          |                                                              |
-| :------------------------------------------------------- | :----------------------------------------------------------- |
-| ![lstm_Nietzsche_32.png](lstm_Nietzsche_32.png)          | ![lstm_Nietzsche_128.png](lstm_Nietzsche_128.png)            |
-
-
 | Instance    | GPUs | Backend    | Batch size | Data Set  | Training  Method | Speed/Epoch (Lower is better) | Unroll Type | No. of samples | Memory(MiB) |
 | ----------- | ---- | ---------- | ---------- | --------- | ---------------- | ----------------------------- | ----------- | -------------- | ----------- |
 | C5.18xLarge | 0    | MXNet      | 32         | Nietzsche | fit()            | 226s 1ms/step                 | TRUE        | 200285         | 0           |
@@ -177,11 +113,6 @@ Note: All the data for performance diagram shown below is taken from the cell ha
 
 ### Results
 
-|                                                          |                                                              |
-| :------------------------------------------------------- | :----------------------------------------------------------- |
-| ![lstm_Wikitext2_32.png](lstm_Wikitext2_32.png)          | ![lstm_Wikitext2_128.png](lstm_Wikitext2_128.png)            |
-
-
 | Instance    | GPUs | Backend    | Batch size | Data Set   | Training  Method | Speed/Epoch (Lower is better) | Unroll Type | No. of samples | Memory(MiB) |
 | ----------- | ---- | ---------- | ---------- | ---------- | ---------------- | ----------------------------- | ----------- | -------------- | ----------- |
 | C5.18xLarge | 0    | MXNet      | 32         | WikiText-2 | fit()            | 3530s 1ms/step                | TRUE        | 1562175        | 0           |

diff --git a/benchmark/benchmark_result/rnn_mxnet_dataset.png b/benchmark/benchmark_result/rnn_mxnet_dataset.png