[Refactor] TransformerDecoder #976

leezu · 2019-10-19T00:56:26Z

Description

In the current Gluon API, each HybridBlock has to serve one puropse and can only
define a single callable interface. Previous Seq2SeqDecoder interface required
each Seq2SeqDecoder Block to perform two functionalities (multi-step ahead and
single-step ahead decoding). This means neither of the two functionalities can
in practice be hybridized completely. Thus use two separate Blocks for the two
functionalities. They may share parameters.

Update the NMTModel API accordingly.

Further refactor TransformerDecoder to make it completely hybridizable.
TransformerOneStepDecoder still relies on a small hack but can be hybridized
completely when we enable numpy shape semantics.

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Refactor TransformerDecoder

Comments

cc @dmlc/gluon-nlp-team

codecov · 2019-10-19T00:56:29Z

Codecov Report

Merging #976 into master will increase coverage by 8.52%.
The diff coverage is 93.15%.

@@            Coverage Diff             @@
##           master     #976      +/-   ##
==========================================
+ Coverage   81.49%   90.01%   +8.52%     
==========================================
  Files          67       67              
  Lines        6392     6368      -24     
==========================================
+ Hits         5209     5732     +523     
+ Misses       1183      636     -547

Impacted Files	Coverage Δ
src/gluonnlp/model/translation.py	`71.87% <100%> (+51.24%)`	⬆️
src/gluonnlp/model/train/__init__.py	`100% <100%> (+25%)`	⬆️
src/gluonnlp/model/seq2seq_encoder_decoder.py	`75.38% <83.33%> (+0.38%)`	⬆️
src/gluonnlp/model/transformer.py	`93.48% <93.65%> (+11.69%)`	⬆️
src/gluonnlp/data/glue.py	`98.63% <0%> (+1.81%)`	⬆️
src/gluonnlp/model/train/embedding.py	`87.17% <0%> (+2.56%)`	⬆️
src/gluonnlp/embedding/token_embedding.py	`92.01% <0%> (+2.83%)`	⬆️
src/gluonnlp/vocab/elmo.py	`96.66% <0%> (+3.33%)`	⬆️
src/gluonnlp/data/word_embedding_evaluation.py	`92.39% <0%> (+3.42%)`	⬆️
... and 13 more

leezu · 2019-10-19T00:57:09Z

scripts/tests/test_encoder_decoder.py

+@pytest.mark.parametrize('use_residual', [False, True])
+@pytest.mark.parametrize('batch_size', [4])
+@pytest.mark.parametrize('src_tgt_seq_len', [(5, 10), (10, 5)])
+def test_transformer_encoder_decoder(output_attention, use_residual, batch_size, src_tgt_seq_len):


No changes to this test. Only refactoring to use pytest.mark.parametrize

leezu · 2019-10-21T19:00:03Z

This is currently blocked by the issue that Block.load_parameters internally uses _collect_params_with_prefix and expects shared parameters to be serialized twice. Need to submit a patch to mxnet master first.

leezu · 2019-10-22T23:20:48Z

Will be unblocked once apache/mxnet#16582 is merged

mli · 2019-10-28T00:41:16Z

Job PR-976/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-976/7/index.html

mli · 2019-10-28T01:20:45Z

Job PR-976/8 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-976/8/index.html

src/gluonnlp/model/transformer.py

scripts/machine_translation/gnmt.py

src/gluonnlp/model/transformer.py

mli · 2019-10-28T07:18:37Z

Job PR-976/9 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-976/9/index.html

szha · 2019-10-28T18:15:47Z

@szhengac @sxjscience ping for review on the API.

In the current Gluon API, each HybridBlock has to serve one puropse and can only define a single callable interface. Previous Seq2SeqDecoder interface required each Seq2SeqDecoder Block to perform two functionalities (multi-step ahead and single-step ahead decoding). This means neither of the two functionalities can in practice be hybridized completely. Thus use two separate Blocks for the two functionalities. They may share parameters. Update the NMTModel API accordingly. Further refactor TransformerDecoder to make it completely hybridizable. TransformerOneStepDecoder still relies on a small hack but can be hybridized completely when we enable numpy shape semantics.

mli · 2019-10-28T19:37:46Z

Job PR-976/10 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-976/10/index.html

src/gluonnlp/model/seq2seq_encoder_decoder.py

src/gluonnlp/model/translation.py

src/gluonnlp/model/transformer.py

mli · 2019-10-29T00:18:44Z

Job PR-976/11 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-976/11/index.html

szhengac · 2019-10-29T06:31:04Z

src/gluonnlp/model/transformer.py

+        mem_value, mem_mask = states
+        if mem_mask is not None:
+            mem_mask = F.expand_dims(mem_mask, axis=1)
+            mem_mask = F.broadcast_like(mem_mask, inputs, lhs_axes=(1, ), rhs_axes=(1, ))


as in xingjian's slides, we can extract the code of generating mask into a fn.

Let's do it in a follow-up PR as it will extend the attention_cell API

sxjscience · 2019-10-29T18:50:14Z

I suggest making the change inside a branch for EMNLP. Changing the API just for a tutorial is not the appropriate working style.

sxjscience · 2019-10-29T20:36:39Z

After offline discussion with the other members, I decide to approve it now. However, these are experimental APIs and we should discuss about that.

leezu requested review from szhengac and sxjscience October 19, 2019 00:56

leezu requested a review from a team as a code owner October 19, 2019 00:56

leezu commented Oct 19, 2019

View reviewed changes

leezu force-pushed the refactortransformer branch from 2b3c537 to 8a9be69 Compare October 19, 2019 01:57

leezu added the release focus Progress focus for release label Oct 25, 2019

leezu force-pushed the refactortransformer branch 3 times, most recently from 4520484 to d7f8e90 Compare October 27, 2019 18:50

szha added the API change label Oct 27, 2019