Skip to content
This repository has been archived by the owner on Jul 1, 2024. It is now read-only.

Commit

Permalink
Added detailed RNN results (#73)
Browse files Browse the repository at this point in the history
* Added detailed RNN results

* Modified table content and added CUDA version
  • Loading branch information
karan6181 authored and sandeep-krishnamurthy committed Jun 15, 2018
1 parent aa5857b commit ef425e4
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 82 deletions.
42 changes: 29 additions & 13 deletions benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,26 +102,42 @@ We have used an official Keras LSTM example scripts [lstm_text_generation.py](ht

We have used an official WikiText-2 character level Dataset from this [link](https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset).

The `lstm_text_generation_wikitext2.py` includes a dataset that is hosted on S3 bucket from this [link](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip) (This is a WikiText-2 raw character level data).
The `lstm_text_generation.py` includes a dataset that is hosted on S3 bucket from this [link](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip) (This is a WikiText-2 raw character level data).

### RNN Benchmark Results

Here, we list the result on Synthetic, Nietzsche, and WikiText-2 dataset using Sequential model(LSTM) on Amazon AWS C5.xLarge(CPU) instance and P3.8xLarge(1, 4 GPUs) with MXNet backend. Batch size is 128. For more details about the instance configuration, please refer [P3](https://aws.amazon.com/ec2/instance-types/p3/) and [C5](https://aws.amazon.com/ec2/instance-types/c5/).
Here, we list the result on Synthetic, Nietzsche, and WikiText-2 dataset using Sequential model(LSTM) on Amazon AWS C5.18xLarge(CPU), C5.xLarge(CPU), and P3.8xLarge(1, 4 GPUs) instance with MXNet backend. Batch size is 128. For more details about the instance configuration, please refer [P3](https://aws.amazon.com/ec2/instance-types/p3/) and [C5](https://aws.amazon.com/ec2/instance-types/c5/).

| Instance | GPUs | Data Set | Speed/Epoch <br />(Lower is better) |
| ---------- | ---- | ---------- | ----------------------------------- |
| C5.xLarge | 0 | Synthetic | 91 sec - 2ms/step |
| P3.8xLarge | 1 | Synthetic | 13 sec - 264us/step |
| P3.8xLarge | 4 | Synthetic | 12 sec - 241us/step |
| C5.xLarge | 0 | Nietzsche | 352 sec - 2ms/step |
| P3.8xLarge | 1 | Nietzsche | 53 sec - 265us/step |
| P3.8xLarge | 4 | Nietzsche | 47 sec - 236us/step |
| C5.xLarge | 0 | WikiText-2 | 6410 sec - 2ms/step |
| P3.8xLarge | 1 | WikiText-2 | 882 sec - 264us/step |
| P3.8xLarge | 4 | WikiText-2 | 794 sec - 235us/step |
For more detailed benchmark results, please refer to [RNN results.](benchmark_result/RNN_result.md)

| Framework/Library | Version |
| ----------------- | ------- |
| Keras | 2.1.5 |
| MXNet | 1.1.0 |
| CUDA | 9.0.176 |



| Instance | GPUs | Data Set | Speed/Epoch (Lower is better) |
| ----------- | ---- | ---------- | ----------------------------- |
| C5.18xLarge | 0 | Synthetic | 24s 485us/step |
| C5.xLarge | 0 | Synthetic | 93s 2ms/step |
| P3.8xLarge | 1 | Synthetic | 13s 261us/step |
| P3.8xLarge | 4 | Synthetic | 12s 240us/step |
| | | | |
| C5.18xLarge | 0 | Nietzsche | 78s 389us/step |
| C5.xLarge | 0 | Nietzsche | 360s 2ms/step |
| P3.8xLarge | 1 | Nietzsche | 52s 262us/step |
| P3.8xLarge | 4 | Nietzsche | 47s 235us/step |
| | | | |
| C5.18xLarge | 0 | WikiText-2 | 1345s 398us/step |
| C5.xLarge | 0 | WikiText-2 | 6417s 2ms/step |
| P3.8xLarge | 1 | WikiText-2 | 868s 257us/step |
| P3.8xLarge | 4 | WikiText-2 | 775s 229us/step |


![rnn_mxnet_dataset](benchmark_result/rnn_mxnet_dataset.png)

## Credits

Synthetic Data scripts modified from
Expand Down
69 changes: 0 additions & 69 deletions benchmark/benchmark_result/RNN_result.md
Original file line number Diff line number Diff line change
@@ -1,59 +1,5 @@
# RNN Benchmark Results (Experimental support)

## Summary
```
NOTE:
RNN support in Keras-MXNet is experimental with few rough edges on CPU training performance and no support for
variable length sequences. Below results are only early preview of the current status.
```

Please see [RNN with Keras-MXNet document](../docs/mxnet_backend/using_rnn_with_mxnet_backend.md) for more details on
the poor CPU training performance and unsupported functionalities.

### Configuration
| | |
| :--------------- | :----------------------------------------------------------- |
| Keras | v2.1.6 |
| TensorFlow | v1.8.0 |
| MXNet | v1.2.0 |
| CUDA | v9.0.176 |
| cuDNN | v7.0.1 |

### LSTM-Nietzsche

| Instance Type | GPUs | Batch Size | Keras-MXNet (Time/Epoch), (GPU Mem) | Keras-TensorFlow (Time/Epoch), (GPU Mem) |
|---|---|---|---|---|
| C5.18X Large | 0 | 128 | 78 sec, N/A | 55 sec, N/A|
| P3.8X Large | 1 | 128 | 52 sec, 792 MB | 83 sec, 15360 MB|
| P3.8X Large | 4 | 128 | 47 sec, 770 MB | 117 sec, 15410 MB |
| P3.16X Large | 8 | 128 | 72 sec, 826 MB | 183sec, 15408TBD |

### LSTM-WikiText2

| Instance Type | GPUs | Batch Size | Keras-MXNet (Time/Epoch), (GPU Mem) | Keras-TensorFlow (Time/Epoch), (GPU Mem) |
|---|---|---|---|---|
| C5.18X Large | 0 | 128 | 1345 sec, N/A | 875, N/A |
| P3.8X Large | 1 | 128 | 868 sec, 772 MB | 817, 15360 MB |
| P3.8X Large | 4 | 128 | 775 sec, 764 MB | 1468, 15410 MB |
| P3.16X Large | 8 | 128 | 1214 sec, 826 MB | 3176 sec, 15410 MB |

### Synthetic Data

| Instance Type | GPUs | Batch Size | Keras-MXNet (Time/Epoch), (GPU Mem) | Keras-TensorFlow (Time/Epoch), (GPU Mem) |
|---|---|---|---|---|
| C5.18X Large | 0 | 128 | 24 sec, N/A | 14 sec, N/A|
| P3.8X Large | 1 | 128 | 13 sec, 792 MB | 12 sec, 15360 MB|
| P3.8X Large | 4 | 128 | 12 sec, 770 MB | 21 sec, 15410 MB |
| P3.16X Large | 8 | 128 | 19 sec, 826 MB | 49 sec, 15360 MB |


# Detailed RNN Benchmark Results

Below is the result of GPU memory usage while running LSTM model on Synthetic, Nietzsche, and WikiText-2 character level dataset.

![MemoryConsumption.png](MemoryConsumption.png)

Note: All the data for performance diagram shown below is taken from the cell having `unroll Type=True`
## Synthetic Dataset

### Configuration
Expand All @@ -71,11 +17,6 @@ Note: All the data for performance diagram shown below is taken from the cell ha

### Results

| | |
| :------------------------------------------------------- | :----------------------------------------------------------- |
| ![lstm_Synthetic_32.png](lstm_Synthetic_32.png) | ![lstm_Synthetic_128.png](lstm_Synthetic_128.png) |


| Instance | GPUs | Backend | Batch size | Data Set | Training Method | Speed/Epoch (Lower is better) | Unroll Type | No. of samples | Memory(MiB) |
| ----------- | ---- | ---------- | ---------- | --------- | ---------------- | ----------------------------- | ----------- | -------------- | ----------- |
| C5.18xLarge | 0 | MXNet | 32 | Synthetic | fit() | 50s 1ms/step | TRUE | 50000 | 0 |
Expand Down Expand Up @@ -124,11 +65,6 @@ Note: All the data for performance diagram shown below is taken from the cell ha

### Results

| | |
| :------------------------------------------------------- | :----------------------------------------------------------- |
| ![lstm_Nietzsche_32.png](lstm_Nietzsche_32.png) | ![lstm_Nietzsche_128.png](lstm_Nietzsche_128.png) |


| Instance | GPUs | Backend | Batch size | Data Set | Training Method | Speed/Epoch (Lower is better) | Unroll Type | No. of samples | Memory(MiB) |
| ----------- | ---- | ---------- | ---------- | --------- | ---------------- | ----------------------------- | ----------- | -------------- | ----------- |
| C5.18xLarge | 0 | MXNet | 32 | Nietzsche | fit() | 226s 1ms/step | TRUE | 200285 | 0 |
Expand Down Expand Up @@ -177,11 +113,6 @@ Note: All the data for performance diagram shown below is taken from the cell ha

### Results

| | |
| :------------------------------------------------------- | :----------------------------------------------------------- |
| ![lstm_Wikitext2_32.png](lstm_Wikitext2_32.png) | ![lstm_Wikitext2_128.png](lstm_Wikitext2_128.png) |


| Instance | GPUs | Backend | Batch size | Data Set | Training Method | Speed/Epoch (Lower is better) | Unroll Type | No. of samples | Memory(MiB) |
| ----------- | ---- | ---------- | ---------- | ---------- | ---------------- | ----------------------------- | ----------- | -------------- | ----------- |
| C5.18xLarge | 0 | MXNet | 32 | WikiText-2 | fit() | 3530s 1ms/step | TRUE | 1562175 | 0 |
Expand Down
Binary file added benchmark/benchmark_result/rnn_mxnet_dataset.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit ef425e4

Please sign in to comment.