-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[RFC] Deferred compute in imperative interface to unify imperative and symbolic interface #16376
Comments
Hey, this is the MXNet Label Bot. |
Thanks for the proposal, @leezu. Since this is a major change, I have some questions regarding the plan. First, should we restrict this mode to only apply to the new numpy arrays? Since the deferred compute mode won't support reverse shape inference, new blocks that implement the forward interface will not work without implementing the parameter shape inference logic in Also, could you elaborate on what the changes are to the |
Sounds really interesting. Can you elaborate a bit more about specific use cases that this enables or simplifies? Is there something that can't be done today that this would enable? Are there major pain points that this would address compared to hybrid-blocks? Etc.. |
Thank you @szha and @asmushetzel for looking through the RFC.
The RFC is not so much about extending what is possible, but improving the user experience. A major issue of the existing API is that Unifying symbolic and imperative mode with deferred compute also works towards [RFC] Introducing NumPy-compatible coding experience into MXNet. While with deferred compute we only trace a computational graph (as with current symbolic API), a logical next step is to provide support for parsing the AST of user provided implementation and directly hybridize it without tracing. You can find some more discussion on it in #14253. AST transformation also benefits from a unified interface, as a separate imperative and symbolic frontend would be meaningless.
It may be feasible to provide support also for the normal ndarray interface. That said, I suggest to consider such support as a bonus. Providing backwards compatibility adds complexity for existing ndarray, which doesn't apply to new numpy arrays. The final decision could be taken later.
Agree that both should happen at the same time
No conceptual change to the existing where |
How's this project going? |
Is there any progress? I really like the def hybrid_forward(self, F, feat):
_B, C, H, W = feat.shape
x = F.linspace(-1, 1, H) even if I know the C, H, W will never change and I will never access the batch size B. I only need the shape once and the shape should be cached. This RFC may fix it. |
This seems to be a big change to the existing operator mode (imperative and symbolic). Could you please provide more information. AFAIK, symbolic API already does deferred init, imperative API is provided to improve user experience. Based on this RFC, what's the advantage of this new deferred_compute mode? As a user, when should I use it or not. Another question. We all know deferred init cause bad user experience when it comes to debugging. Would this RFC address the debuggability issue? If it's about performance optimization, could we have some initial data of using this new deferred mode vs. existing imperative mode? Thanks, Lin |
Essentially the motivation for deferred compute is to extend imperative mode to enable users to "construct a symbol" without using symbolic API. This addresses confusion around having two APIs and prevents divergence between imperative and symbolic APIs. There's no need to drop the existing imperative / symbolic APIs due to deferred compute.
Please ask a question and I'll answer ;)
Based on deferred compute we can simplify class Dense(HybridBlock):
def __init__(self, units, use_bias=True, flatten=True,
dtype='float32', weight_initializer=None, bias_initializer='zeros',
in_units=0):
super().__init__()
self._flatten = flatten
self._units = units
self.weight = gluon.Parameter(shape=(units, in_units),
init=weight_initializer, dtype=dtype,
allow_deferred_init=True)
if use_bias:
self.bias = gluon.Parameter(shape=(units,),
init=bias_initializer, dtype=dtype,
allow_deferred_init=True)
else:
self.bias = None
def forward(self, x): # We allow users to overwrite forward() directly.
ctx = x.context
return npx.FullyConnected(x, self.weight.data(ctx), self.bias.data(ctx),
no_bias=bias is None, num_hidden=self._units,
flatten=self._flatten, name='fwd')
There would be no reason for users to explicitly use the API.
This RFC is orthogonal to deferred init. When updating However, the other option is to allow deferred initialization of weights and require users to implement This works around the failures of symbolic shape inference for deferred init in case of dynamic shape ops, while still allowing users to decide the shape of weight at first forward. In the example above, it could look like: class Dense(HybridBlock):
def __init__(self, units, use_bias=True, flatten=True,
dtype='float32', weight_initializer=None, bias_initializer='zeros',
in_units=0):
[...]
def infer_shape(self, x):
self.weight.shape = (self.weight.shape[0], x.shape[1])
def forward(self, x):
[...]
There is the option to improve performance of imperative mode by deferring the computation and optimizing the computational graph before performing the computation. But this is not the main motivation and I haven't optimized for this use-case (yet). In the |
…ption Signed-off-by: Serge Panev <spanev@nvidia.com>
A new deferred computation (DC) argument to the imperative MXNet APIs is
proposed. If enabled, memory allocation and computation is deferred as long as
possible. Users can export the computational graph recorded during deferred
computation, which enables hybridization support.
Arrays for which DC is enabled are called lazy. Other arrays are called
normal. Inplace operations on lazy arrays are unsupported.
Storage allocation and computation for lazy arrays is deferred until their
results are required by conversion to numpy or use as input to an operator
creating a normal array. Accessing attributes such as
shape
can also triggercomputation if the attribute can't be inferred.
Update: The proposed implementation in #17530 differs slightly from the API previously described in this RFC. Thus I deleted the API docs in this RFC. Please refer to the PR. For example, a global state is used to enable / disable deferred compute, instead of introducing a new invocation API
MXImperativeDeferredInvokeEx
.FAQ
How about Autograd,
NDArray.autograd_entry_
andAGInfo
?Autograd inside deferred computation (DC) mode can be supported.
Relation of Autograd and DC: While autograd’s
RecordOp
provides a similarrecording functionality to the deferred computation, the autograd graph is not
the same as a computational graph:
NDArray::Detach()
serves to detach a nodefrom the autograd graph by deleting
NDArray.entry_
, though theNodeEntry
isstill required for reconstructing the computational history of how this NDArray
came to be.
Are reqs like
kInPlace
supported?No. For now only
kWriteTo
is supported in DC mode.The plan is to replace inplace operations with
kWriteTo
operations, writing toa new (lazy) array. The framework should be smart enough to decide when to reuse
memory and when not. It shouldn’t be required for users to specify that they
want an inplace operation.
How is context attribute handled, specifically context changes?
Cross-device copy must be represented as operator (
CrossDeviceCopyOp
) whichrequires special handling in the graph executor.
How is incomplete shape information handled?
shape
property triggers computation if shape is accessed and can't be inferred completely.Users can access
static_shape
if they wan't to avoid triggering computation.Python (Gluon)
Based on DC, hybridization in Gluon is simplified:
Instead of implementing
def hybrid_forward(self, F, x, ...)
inHybridBlock
,users can opt to implement
def forward(self, x, ...)
inHybridBlock
.Hybridization based on DC works by the HybridBlock performing the following
steps (if it is not called by a parent block being hybridized)
arrays to pass them to
MXNDArrayGetDeferredComputeSymbol
;forward
A (internal) global context variable tracks if hybridization is ongoing. If set
to False and a Block is called that is to be hybridized, the global context
variable is set to True and the Block goes through all 4 steps outlined above;
finally the context variable is set back to False after the export to Symbol
step is finished.
Usage example
Hybridizing
gluon.Block
s?DC could be used to support hybridzing
Block
if all logic can be traced. Aseparate effort may add logic to detect these cases and add hybridization
support based on DC. For now we rely on user to signify hybridization support by
subclassing
HybridBlock
.Parameter Shape Inference
For HybridBlock making use of DC for hybridization, we request users to
implement
HybridBlock.infer_shape
to infer the parameters shape given theinputs.
Currently, if
HybridBlock.infer_shape
is not implemented, backward shapeinference is used to infer the shape of parameters. However backward shape
inference is not supported in all cases (cf #14253,
#14983 (comment))
and relying on it for parameter shape inference is brittle. Thus for consistency
and simplicity we require
infer_shape
method implementation when usinghybridization based on DC.
The text was updated successfully, but these errors were encountered: