Add deferred compute support #17530

leezu · 2020-02-05T20:52:30Z

Description

Implements #16376

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Add imperative deferred compute for Gluon 2

Comments

python/mxnet/_deferred_compute.py

reminisce

Great work! Just a few comments.

include/mxnet/imperative.h

src/imperative/imperative.cc

python/mxnet/_deferred_compute.py

src/imperative/imperative.cc

src/c_api/c_api_ndarray.cc

include/mxnet/ndarray.h

src/c_api/c_api_ndarray.cc

src/imperative/cached_op.h

src/imperative/imperative.cc

src/ndarray/ndarray.cc

samskalicky

Thanks for this contribution @leezu!

eric-haibin-lin · 2020-03-18T20:34:44Z

LGTM

Require users to call dc.set_variable(array, symbol) for every input array used in deferred compute. Remove input and input_names arguments from dc.get_symbol. Thereby prevent users from inadvertently using arrays as inputs without specifying them in dc.get_symbol. Such use previously yielded "unspecified inputs" error at time of dc.get_symbol call, making it hard for users to find out where there code is wrong. Now, fail fast and throw the error as soon as an "unsupported" array is used. Note that below examples use the private dc.context and dc.set_variable APIs. Users will not interact with it. It is used internally in HybridBlock. Example of pitfall prior to this commit: a = mx.np.zeros((10, 10)) with dc.context(): # Creating an array from list can't be recorded. Must be specified as input. b = mx.np.array([1,2,3]) c = a[b] dc.get_symbol(inputs=a, outputs=c) # Throws "unspecified input" error. "Correct" usage prior to this commit: a = mx.np.zeros((10, 10)) with dc.context(): # Creating an array from list can't be recorded. Must be specified as input. b = mx.np.array([1,2,3]) c = a[b] dc.get_symbol(inputs=[a, b], outputs=c) Following this commit: a = mx.np.zeros((10, 10)) dc.set_variable(a, mx.sym.var('a').as_np_ndarray()) with dc.context(): b = mx.np.array([1,2,3]) c = a[b] # Throws: b is not associated with a variable or deferred computed

* 'master' of https://github.com/apache/incubator-mxnet: (192 commits) * impl - FFI for np einsum (apache#17869) [Numpy] FFI for diag/diagonal/diag_indices_from (apache#17789) [Numpy] Kron operator (apache#17323) cmake: Set DMLC_LOG_FATAL_THROW only for building mxnet and not for tvm (apache#17878) Add simplified HybridBlock.forward without F (apache#17530) Use FP32 copy of weights for norm (multitensor LAMB optimizer) (apache#17700) Use multi-tensor sumSQ in clip_global_norm (apache#17652) [Numpy] Add op fmax, fmin, fmod (apache#17567) Adding sparse support to MXTensor for custom operators (apache#17569) Update 3rdparty/mkldnn to v1.2.2 (apache#17313) Dynamic subgraph compile support (apache#17623) Refactor cpp-package CMakeLists.txt & add missing inference/imagenet_inference (apache#17835) staticbuild: Fix potential user-assisted execution of arbitrary code (apache#17860) * FFI for np.argmax and np.argmin (apache#17843) ffi for roll/rot90 (apache#17861) Skip test_multi_worker_dataloader_release_pool on OS X (apache#17797) add ffi for full_like, binary (apache#17811) HybridBlock.export() to return created filenames (apache#17758) Fix SoftReLU fused operator numerical stability (apache#17849) CI: Test clang10 cpu & gpu builds with -WError (apache#17830) ...

Users can now implement HybridBlock.forward instead of HybridBlock.hybrid_forward. HybridBlock.forward has the same signature as Block.forward. For example: class MyBlock(mx.gluon.HybridBlock): def __init__(self, *, prefix=None, params=None): super().__init__(prefix, params) with self.name_scope(): self.dense = mx.gluon.nn.Dense(units=10) self.weight = self.params.get('weight', allow_deferred_init=True) def infer_shape(self, x): self.weight.shape = (x.shape[1], ) def forward(self, x): return self.dense(x) + self.weight.data(x.context) Hybridization of HybridBlock.forward is based on a deferred computation mode in the MXNet backend, which enables recording computation via tracing in the mxnet.nd and mxnet.np interfaces. The recorded computation can be exported to a symbolic representation and is used for optimized execution with the CachedOp. As tracing is based on the imperative APIs, users can access shape information of the arrays. As x.shape for some array x is a python tuple, any use of that shape will be a constant in the recorded graph and may limit the recorded graph to be used with inputs of the same shape only. As part of the change from hybrid_forward to forward, we also disable support for parameter shape inference in the MXNet backend in the case of deferred parameter initialization. Shape inference in the backend was limited and did by it's very nature not support dynamic shape operators. Instead, users should now always implement HybridBlock.infer_shape to set the parameter shapes if the parameter shape was not set during HybridBlock.__init__. See the example above. An example of the internal deferred compute APIs is: a = mx.np.arange(10) dc.set_variable(a, mx.sym.var('a').as_np_ndarray()) with dc.context(): b = a ** 2 symbol = dc.get_symbol(b)

leezu requested a review from eric-haibin-lin February 5, 2020 20:52

leezu requested review from anirudh2290 and szha as code owners February 5, 2020 20:52

leezu force-pushed the deferredcompute branch 3 times, most recently from 72df483 to 44b82fa Compare February 12, 2020 02:44

eric-haibin-lin requested a review from ptrendx February 12, 2020 22:25

eric-haibin-lin reviewed Feb 12, 2020

View reviewed changes

python/mxnet/_deferred_compute.py Show resolved Hide resolved

reminisce reviewed Feb 13, 2020

View reviewed changes

samskalicky reviewed Feb 13, 2020

View reviewed changes

include/mxnet/ndarray.h Show resolved Hide resolved