Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-107]Fused GRU implementation for CPU #10311

Merged
merged 63 commits into from
Jun 6, 2018
Merged

[MXNET-107]Fused GRU implementation for CPU #10311

merged 63 commits into from
Jun 6, 2018

Conversation

lihaofd
Copy link
Contributor

@lihaofd lihaofd commented Mar 29, 2018

Description

In this PR, it aligns with #10104's registration way and creates a fused GRU operator for CPU.
@pengzhao-intel, @TaoLv , @sherry-zhang

Feature changes

New features

  • Single layer/Multiple layer and unidirectional/bidirectional GRU, including both forward and backward computation.
  • Support kAddTo/kNullOp check for both GRU and LSTM

Unit-test changes

  • Create new testcase in tests/python/unittests/test_operator.py.
  • update testcase in example/rnn/bucketing/cudnn_rnn_bucketing.py
  • Check consistency with original GRUCell implementation.

Performance

We have tested performance of sym.RNN and rnn.GRUCell on our local Skylake-6148 with 2 Sockets and 40 cores. Use MKL as blas lib in this performance test.
Test input size is from DS2 default parameters(seq_length = 300, batch_size = 20, input_size = 800, hidden_size = 800).

Layer=1 bidirectional = False

API Inference time(fwd, sec) Training time(fwd + bwd, sec)
rnn.GRUCell 0.12 0.277
this PR 0.048 0.106
speedup 2.5x 2.6x

Layer=5 bidirectional = True

API Inference time(fwd, sec) Training time(fwd + bwd, sec)
rnn.GRUCell 1.275 2.94
rnn.GRUCell (cuda) 1.047 1.653
rnn.GRUCell (cudnn) 0.161 0.442
this PR 0.551 1.72
speedup -this PR/GRUCell 231.4% 171%
speedup -this PR/GRUCell (cuda) 190% 96%
speedup -this PR/GRUCell (cudnn) 29.2% 25.7%

Convergency Curve

We have tested Convergency of FusedGRU on our CPU-Skylake-8180 with 2 Sockets and 56 cores and GPU-P100 by using example/rnn/bucketing/cudnn_rnn_bucketing.py
Test input size is layer = 3, batch_size = 32, num-embed = 800, num-hidden = 800, num-epochs 20
fusedgru_cpu_gpu

Checklist

  • Passed code style checking (make lint).
  • All changes have test coverage.
  • Code is well-documented.

@piiswrong
Copy link
Contributor

piiswrong commented Apr 23, 2018

@sherry-zhang @yajiedesign @TaoLv
ping
can we get some review on this?

@yajiedesign
Copy link
Contributor

look like good

@TaoLv
Copy link
Member

TaoLv commented Apr 24, 2018

@piiswrong @yajiedesign I think we should start review on #10104 firstly, since most design and implementation of this PR is following #10104 and #10104 has enabled all the RNN related unit tests.
I posted a design doc here to describe what we did in these two PRs and what we plan to do in the near future. Feel free to review this doc also and give comments.
https://docs.google.com/document/d/1XC_PmbSc7q6px22LIW3vwhbA_wmX8wRGLRnet3pMJrs/edit?usp=sharing

@lihaofd lihaofd requested a review from szha as a code owner April 27, 2018 02:55
@@ -93,6 +93,41 @@ def test_lstm_bidirectional():
check_rnn_consistency(stack, fused, T, N, I, H)
check_rnn_consistency(fused, stack, T, N, I, H)

@with_seed()
def test_gru_sym():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szha @piiswrong May I know how to check the gradient weight and gradient bias in test case? Seems they are not verified in the check_rnn_consistency function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can get gradient with block.weight.grad()

@@ -0,0 +1,235 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't copy a whole file. rename cudnn_lstm_bucketing.py to cudnn_rnn_bucketing.py and then add a switch for the mode instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks!

@piiswrong
Copy link
Contributor

I missed one thing in the previous PRs: looks like backward doesn't support req == kAddTo? Also it looks like req[kParams] isn't even checked

@piiswrong
Copy link
Contributor

Please also check for kNullOp and skip when necessary.

@lihaofd
Copy link
Contributor Author

lihaofd commented Jun 4, 2018

@piiswrong, kNullOp checking is added. Please help to review it. thx

@piiswrong
Copy link
Contributor

piiswrong commented Jun 4, 2018

You need to check for kNullOp for kData and kState too. And when req is kNullOp, nothing should be written to the corresponding output array.

It's not enough to just skip filling 0

@@ -474,6 +495,9 @@ class RNNOp : public Operator{
CHECK(dw.CheckContiguous());
CHECK(dhx.CheckContiguous());
CHECK(dy.CheckContiguous());
if (req[rnn_enum::kParams] != kAddTo && req[rnn_enum::kParams] != kNullOp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't gradient still going to be overwritten by backward kernel later?

const int omp_threads = mxnet::engine::OpenMP::Get()->GetRecommendedOMPThreadCount();
#pragma omp parallel for num_threads(omp_threads)
for (int i = 0; i < D * H * 3 * H; ++i) {
dwh[i] = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still overwriting gradient even when req[kParams] = kAddTo ?

@piiswrong piiswrong merged commit 069026a into apache:master Jun 6, 2018
@ThomasDelteil
Copy link
Contributor

ThomasDelteil commented Jun 12, 2018

This seems to have generated flakiness on this test: #11202 and here #11219
This is happening on a large portion of the windows builds

@lihaofd
Copy link
Contributor Author

lihaofd commented Jun 12, 2018

@ThomasDelteil , we will look into the issue.
Thanks!

zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
* Add GRU Support and Test Case

* skip the gpu test case that has nothing to do with RNN GRU

* fix robust bug for gru backward

* fix bug for unifying weight parameter

* add GRU multiple layer and bidirection support with test case

* fix test case bug

* fix test case bug

* fix bug for memory issue

* fix bug for bidirection

* rebase code and fix bug for memory corruption issue

* fix gpu compile issue

* fix bug and enable some test cases

* fix robust bug

* trigger the build to check if quantize-gpu case is covered

* trigger the build to check if MKLDNN+GPU case is covered

* disable failed gpu test case of MKLDNN_UTIL_FUNC-MemFormat because it has nothing to do with this PR and will recover it once the issue is passed

* skip failed test_reduce test case temporarily as it has nothing to do with RNN

* enable several test cases

* retrigger the build

* rebase code from lstm

* rebase code for resolve conflict

* add gru code after resolve conflict

* fix bug for resolve conflict

* add Fused GRU code with test case

* retrigger the build

* add GetRecommendedOMPThreadCount for omp

* fix conflict issue

* add gru relate code

* fix bug for code

* update code for gru

* retrigger the build

* fix code about gru condition

* enhance test case to test gradient weights and bias

* fix bug for test case

* fix bug for test case

* fix bug about dropout condition and test case

* fix bug for test case

* fix bug for test case

* retrigger the build

* rebase code

* add gru code

* fix issues about namespace, removing define and memcpy

* retrigger the build

* fix issues and add cudnn_gru_bucketing.py test case

* retrigger the build

* update cudnn_rnn_bucketing.py test case

* update cudnn_rnn_bucketing.py test case

* update cudnn_rnn_bucketing.py test case

* add check for req[kParams] and kAddTo from cudnn_rnn-inl.h

* retrigger the build

* retrigger the build

* retrigger the build

* add kNullOp check

* retrigger the build

* update kNullOp support and test case for both GRU and LSTM

* update kAddToOp support for both GRU and LSTM
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
* Add GRU Support and Test Case

* skip the gpu test case that has nothing to do with RNN GRU

* fix robust bug for gru backward

* fix bug for unifying weight parameter

* add GRU multiple layer and bidirection support with test case

* fix test case bug

* fix test case bug

* fix bug for memory issue

* fix bug for bidirection

* rebase code and fix bug for memory corruption issue

* fix gpu compile issue

* fix bug and enable some test cases

* fix robust bug

* trigger the build to check if quantize-gpu case is covered

* trigger the build to check if MKLDNN+GPU case is covered

* disable failed gpu test case of MKLDNN_UTIL_FUNC-MemFormat because it has nothing to do with this PR and will recover it once the issue is passed

* skip failed test_reduce test case temporarily as it has nothing to do with RNN

* enable several test cases

* retrigger the build

* rebase code from lstm

* rebase code for resolve conflict

* add gru code after resolve conflict

* fix bug for resolve conflict

* add Fused GRU code with test case

* retrigger the build

* add GetRecommendedOMPThreadCount for omp

* fix conflict issue

* add gru relate code

* fix bug for code

* update code for gru

* retrigger the build

* fix code about gru condition

* enhance test case to test gradient weights and bias

* fix bug for test case

* fix bug for test case

* fix bug about dropout condition and test case

* fix bug for test case

* fix bug for test case

* retrigger the build

* rebase code

* add gru code

* fix issues about namespace, removing define and memcpy

* retrigger the build

* fix issues and add cudnn_gru_bucketing.py test case

* retrigger the build

* update cudnn_rnn_bucketing.py test case

* update cudnn_rnn_bucketing.py test case

* update cudnn_rnn_bucketing.py test case

* add check for req[kParams] and kAddTo from cudnn_rnn-inl.h

* retrigger the build

* retrigger the build

* retrigger the build

* add kNullOp check

* retrigger the build

* update kNullOp support and test case for both GRU and LSTM

* update kAddToOp support for both GRU and LSTM
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants