-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[RFC] MXNet 2.0 API Deprecation #17676
Comments
cc @apache/mxnet-committers |
What about those v1, v2 APIs? |
We may also drop ONNX in MXNet 2. I'm not aware of anyone working on ONNX in MXNet and TVM can be used as a replacement. |
I think we should keep ONNX APIs, since it is able to export many basic models, although it is not perfect. Users will train their models in MXNet 2.0, and export ONNX model, then use the ONNX model in their deployment frameworks. (http://onnx.ai/supported-tools). It is useful to attract users to use MXNet 2.0 to train their models with ONNX. |
From: @samskalicky
TensorRT support is currently using ONNX to convert from NNVM: https://github.com/apache/incubator-mxnet/blob/746cbc55fd666bb4529e88d247fed8e0907270f9/src/operator/subgraph/tensorrt/tensorrt.cc#L313-L318<https://github.com/apache/incubator-mxnet/blob/746cbc55fd666bb4529e88d247fed8e0907270f9/src/operator/subgraph/tensorrt/nnvm_to_onnx-inl.h>
Although I would like to see TensorRT support moved away from ONNX with a native integration using the Accelerator API compile support: #17623. But the migration from ONNX to AccAPI is still in discussion and the compile support PR is not merged yet (shameless plug: please review! :-D)
Sam
On Feb 28, 2020, at 9:06 PM, JackieWu <notifications@github.com<mailto:notifications@github.com>> wrote:
I think we should keep ONNX APIs, since it is able to export many basic models, although it is not perfect. Users will train their models in MXNet 2.0, and export ONNX model, then use the ONNX model in their deployment frameworks. (http://onnx.ai/supported-tools).
It is useful to attract users to use MXNet 2.0 to train their models with ONNX.
…--
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#17676 (comment)
|
I am generally in favor of those deprecations. The scariest part is the removal of |
We have Do we need cover them in the RFC? How to deprecate or unify these APIs? |
@TaoLv the search result shows API in the following categories:
|
If I understand this correctly, since the Scala, Java, and Clojure bindings use symbol, ndarray, and module exclusively, this will also mean that they will be effectively deprecated as well. This is fine if what the community decides upon, but it should be called out explicitly. |
Further thinking this through - since the Scala language binding currently provides the base for both Java and Clojure, I would be nice to know what the future plans for the Scala language binding is. Whether or not that path is supported will determine the other JVM langs. |
@gigasquid @zachgk Since the Scala API are built a while ago, I can see some of the deprecated sections: Module, DataparallelGroup, Symbol ... Most of the training component would be invalid. There can be three approaches:
I am not sure which way works the best, please leave any thoughts you have. |
Is there anyway to get the stats on downloads of the maven central scala/clojure jars to see how much current use there is? If the numbers are high or low and what trend is can help shape the decision |
@gigasquid Yeah, you can view the download statistics from https://repository.apache.org/#central-stat. |
Thanks @zachgk - I took a couple of screenshots so I could share here and here is the Clojure package There are far more Scala downloads that Clojure. @lanking520 and other Scala package maintainers. I thank you for all the work that you've done on the Scala package so far. I will support whatever decision makes most sense for the Scala package and for the JVM MXNet users for 2.0. Let's just make a plan and coordinate whatever that is so that the current users have the most information to plan accordingly. |
+1 for keeping ONNX support. Although it has a lot of small problems, but I've converted a lot of pytorch models to mxnet for deploying with the following pipeline: |
+1 for "upgrade/rewrite Scala API and bring up MXNet 2.0 features" as it took us a lot of efforts to bring MXNet to Scala originally and there are already adopters of Scala API in industries. |
What's the big spike in January? |
Good question. I don't know. There wasn't a new release then. 🤷♀ |
@lanking520 @zachgk @terrytangyuan @aaronmarkham could one of you start a discussion in a new issue on the JVM ecosystem support in 2.0? This topic seems to require extended discussion. |
I created one here #17783 |
caffe usage is very low now and let's deprecate caffe converter too. |
In the long run, gluon vision model zoo will be maintained in GluonCV and therefore mxnet.gluon.vision.model_zoo should be deprecated to avoid duplicate maintenance efforts in 2.0 |
@zhreshold thanks for bringing this up. Currently the test for that is the longest running one too, so if there's no objection I hope that we could move forward in removing it soon |
mxnet.rnn module should be deprecated and removed too given it's designed for interacting with symbol API. |
Hi there, I am too a little concerned about dropping module support. Since a large percent of the user based started with module APIs, dropping that support could alienate the user base. I got familiar around mxnet-1.3. The main functions I appreciate are:
|
@yifeim module API will continue to be supported in 1.x and users are free to stay on that version. For 2.x, we will only support numpy/npx API so users who adopt those API will have to reimplement the model anyway. the main function you listed will all be available in 2.x. |
👍 |
Drop the following loss operators since they are used with Module API:
|
NNPACK is currently only supported in the Makefile build (#15974), which will be removed. I think oneDNN (mkldnn) replaced it and we can remove it. Any concerns? |
Since the Gluon built-in model zoo is being deprecated and some tests still rely on them, these models will be moved to test_utils.py |
The models in the model zoo rely on ndarray operators which are currently recommended for removal. Thus keeping these models in test_utils.py won't work when proceeding with the operator removal. |
Just confirmed that If ndarray is being deprecated, the effort of rewriting model_zoo with If consensus is made to move model_zoo for testing purpose to test_utils.py I can follow this up in #18480 |
As the MXNet community is working on the next major version of MXNet as described in #16167, this RFC seeks to clarify the scope of API deprecation, to inform the community of the replacement API design, and to ensure informed consensus.
Thanks to the long history of MXNet and the accumulated efforts of the community, MXNet now supports a wide range of neural network model training and deployment use cases. Many of these use cases have seen several generations of API design and implementation. Take model training as an example, there have been the Symbol Model API, Symbol Module API, and Gluon Hybrid Block API, all of which coexist in MXNet. Older generations of API often have a significant body of users and thus require time from the community to maintain, even though the supported use cases are mostly covered by a newer generation of API. Such requirement for maintenance not only consumes time and energy of the MXNet community and can distract the community from its longer term goal, but also causes pressure on CI, binary distribution.
In this RFC, we list several candidate API to be deprecated and the corresponding new generation of API as replacement. Unless otherwise stated, these APIs will continue to be supported in the future 1.x releases that happen in parallel to the 2.0 development. On the other hand, participating in the RFC for the new replacement feature of the feature you are interested in is the best way to ensure continued support in 2.0 for that feature. To make it easier to navigate, the replacement feature RFCs are linked in each section.
To make the discussion more productive, I recommend the following actions:
Please always keep the discussion civilized and informative. Comments otherwise will be folded.
mxnet.numpy and mxnet.ndarray
Traditionally MXNet provided
mx.nd
API with operators inspired, but often incompatible with Numpy. Based on RFC #14253 there has been a large and ongoing effort to provide Numpy compatible operators inmx.np
namespace.This means that MXNet currently provides two incompatible APIs with separate backend implementations achieving similar goals, doubling the maintenance burden of developers. Note that there are some deep learning operators in
mx.nd
that don't have the counterparts inmx.np
. These operators will be migrated tomx.npx
namespace and will be tracked in #17096.Given the wide impact of this decision, these people convened on 2/19/2020 and reached consensus on recommending Removal and parallel maintenance of 1.x and 2.x as the option forward: @eric-haibin-lin, @mli, @haojin2, @szhengac, @YizhiLiu, @sxjscience, @reminisce, @leezu, @zhreshold, @apeforest, @oorqueda, @rondogency
mx.nd
,mx
namespaces in Python and require analogous changes in other frontends. 2. Remove operators not exposed viamx.np
andmx.npx
from the backend.mx.nd
,mx.sym
, andmx.sym.np
but discourage its use for example via deprecation warning. Only fix regressions introduced in MXNet 2. Remove in MXNet 3. 2. Provide backwards compatibility inmx.gluon
forF
and (other MXNet 1 features) 3. May introduce breaking changes on operator level, such as improved Optimizer Operators (PR #17400). Any such change must provide instructions for users.mx.v1.nd
,mx.v1.sym
, andmx.v1.gluon
frontend namespaces. Discourage use. Remove in MXNet 3. 2. Dropmx.nd
,mx.sym
,mx.sym.np
in MXNet 2 and potentially introduce breaking changes inmx.gluon
. 3. May introduce breaking changes on operator level, such as improved Optimizer Operators (PR #17400). Any such change must provide instructions for users.APIs to remove or deprecate:
mx.nd
,mx.sym
Replacement APIs:
mx.np
,mx.npx
Symbol and NDArray
Traditionally MXNet recommended users to statically declare their machine learning models with the
mx.sym
symbolic API. In 2018, Amazon in collaboration with Microsoft published Gluon API which MXNet community then implemented so that users could enjoy the flexibility of imperative mode together with the benefits of a symbolic computational graph.Gluon exploited the similarity between
mx.sym
andmx.nd
and asked users to write code that would work irrespectively of the namespace used in agluon.HybridBlock
using a placeholderF
that could either refer tomx.sym
ormx.nd
. As the basic building blocks ofmx.sym
andmx.nd
,Symbol
andNDArray
have diverging behaviour the use ofnn.HybridBlock
required users to learn the details of each.Taking a step back, exposing the distinction between
mx.sym
andmx.nd
in the frontend to users is a sufficient but not necessary approach to provide users with the flexibility of imperative mode together with the benefits of a symbolic computational graph. To improve the user experience, we like to reconsider this approach, providing a unified imperative and symbolic API based on the concept of deferred computation.Deferred computation (RFC: #16376, PR: #17530) extends the NDArray in the MXNet backend to (when enabled) compute only metadata (such as shape) eagerly while tracking the computational graph in a symbolic fashion and deferring storage allocation and computation until access to the results of the computation is requested. It further provides APIs to export the recorded graph as a Symbol. Together these are used to provide Gluon hybridization and exporting to other language frontends.
APIs to remove or deprecate:
mx.sym
,mx.sym.np
Replacement APIs:
mx.np
Gluon API
Deferred Compute PR contains the required changes to Gluon API that enables Gluon based on deferred compute: #17530
We auto-detect Gluon 1 use (user implements
HybridBlock.hybrid_forward
) vs new API.mx.model and mx.module
Both mx.model and mx.module were introduced before MXNet 0.7 as high level APIs to describe model architecture and parameters associated. The Gluon API was made generally available in MXNet 1.0, and is easier to use compared to model and module APIs. In MXNet 2.0, I propose:
C-API clean-up
As part of the efforts in #17097 to improve performance for imperative execution, we are adopting the PackedFunc based FFI as described in [1]. The design of this FFI can be found in [2]. The implementation of PackedFunc can be found in [3]. Once PackedFunc based FFI is merged, the C APIs will be registered as PackedFunc in the runtime system. This brings the benefit of reducing the need for directly maintaining the optimized FFI such as our cython implementation for a large number of functions.
Note that this change is limited to the C-APIs in
include/mxnet/c_api.h
and it does not affectinclude/mxnet/c_predict_api.h
orinclude/mxnet/c_api_test.h
.Support for other frontend languages
Since MXNet’s frontend languages all rely on the C-API, this implies changes to the other language bindings too. As stated in MXNet 2.0 roadmap RFC [4], the language bindings are expected to move together with this change as initiated by the maintainers of those language bindings.
Currently, the PackedFunc implementation in TVM already support Python, Java/Scala, C++/Rust, and Javascript, and thus it can be directly supported for our existing language bindings for Python, Java, Scala, C++, and Javascript. This leaves the support for Perl and R which is feasible but pending development.
Deprecated API
As the project evolves, we found the need for extending some of the existing API and thus added new versions of them to supercede and to deprecate the old versions. The old versions were left in the API definition for backward-compatibility, which increases the surface area for support. In MXNet 2.0, we will remove these deprecated versions of API and rename the substitute *Ex API to remove the Ex suffice. This also includes the new API for large tensor support with *64 suffix. The list of such APIs include:
List of groups of API
Build System with Makefile
CMake build supports all use-cases of the Makefile based build, but Makefile based build only supports a subset of CMake based build. To simplify maintenance, we thus remove the Makefile based build.
IO/DataIter API
Clean up
mxnet.image
module. Similar functions will be provided inmxnet.gluon.data
mxnet.image.Augmenter
and all subclasses. → Replace withmxnet.gluon.data.vision.transforms.*
mxnet.image.ImageIter
→ replace withmxnet.gluon.data.vision.ImageFolderDataset
ormxnet.gluon.data.vision.ImageRecordDataset
ormxnet.gluon.data.vision.ImageListDataset
mxnet.image.detection
module, includingmxnet.image.DetAugmenter
andmxnet.image.ImageDetIter
, → replace withmxnet.gluon.data.vision.ImageRecordDataset
ormxnet.gluon.data.vision.ImageListDataset
Keeps iterators in
mxnet.io
, however,DataBatch
, to be aligned withDataLoader
DataLoader
Python 2 Support Deprecation
Python 2 is unmaintained as of January 2020. MXNet 1.6.x series is the last to support Python 2.
See #8703 and consensus in [5].
References
[1] https://docs.tvm.ai/dev/runtime.html#packedfunc
[2] https://cwiki.apache.org/confluence/display/MXNET/MXNet+FFI+for+Operator+Imperative+Invocation
[3] #17510
[4] #16167
[5] https://lists.apache.org/thread.html/r3a2db0f22a1680cc56804191446fef2289595798ca19fd17de1ff03e%40%3Cdev.mxnet.apache.org%3E
The text was updated successfully, but these errors were encountered: