Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MXNet FFI for Operator Imperative Invocation #17510

Merged
merged 28 commits into from
Feb 24, 2020
Merged

Conversation

hzfan
Copy link
Contributor

@hzfan hzfan commented Feb 3, 2020

Description

This is a follow-up pr of #17097. It aims to reduce the imperative invoke overhead with tvm ffi when cython is enabled. I implemented np.zeros as a demo. Currently the data types supported by the new ffi are

  • tuple and list
  • str
  • Number

The details and performance comparison can be seen at https://cwiki.apache.org/confluence/display/MXNET/MXNet+FFI+for+Operator+Imperative+Invocation .

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

@hzfan hzfan force-pushed the poc-tvmffi_pr branch 2 times, most recently from e7d2e44 to 69b989f Compare February 5, 2020 16:57
@hzfan hzfan changed the title [DO NOT REVIEW] MXNet FFI for Operator Imperative Invocation [WIP] MXNet FFI for Operator Imperative Invocation Feb 6, 2020
@hzfan hzfan marked this pull request as ready for review February 6, 2020 04:39
@hzfan hzfan force-pushed the poc-tvmffi_pr branch 3 times, most recently from d29ae7f to f7af173 Compare February 10, 2020 04:03
@hzfan hzfan changed the title [WIP] MXNet FFI for Operator Imperative Invocation MXNet FFI for Operator Imperative Invocation Feb 11, 2020
python/mxnet/_ffi/_ctypes/object.py Show resolved Hide resolved
python/mxnet/_ffi/_ctypes/types.py Show resolved Hide resolved
python/mxnet/_ffi/runtime_ctypes.py Show resolved Hide resolved
python/mxnet/_global_var.py Show resolved Hide resolved
* }
* \endcode
*/
using FType = std::function<void (MXNetArgs args, MXNetRetValue* rv)>;
Copy link
Member

@wkcn wkcn Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question about the ABI compatibility of std::function.

std::function is a STL container, which has different implementations between different versions of compiler (e.g. gcc4 and gcc5) or different compilers (gcc and clang).

For example, MXNet is built in gcc4, which uses the reference & to store the function arguments. However, the other application is built in gcc5, which uses the rvalue reference && to store the function arguments. The other application could not call the MXNet API by PackedFunc, since the implementations of std::function are different between gcc4 and gcc5.

Copy link
Contributor Author

@hzfan hzfan Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wkcn :). Basically PackedFunc does not go across the dll boundary. It remains in the MXNet side.

When application A calls PackedFunc, it first passes the function name to MXNet. Then MXNet finds the corresponding PackedFunc, and returns the address of the PackedFunc back to A (A may cache the name-address map). Finally, A invokes that PackedFunc with the address.

To conclude, A is not aware of std::function. It only gets an address which is a ABI-compatible void*.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hzfan Thank you!

Yes. In the application A, we can call MXNetFuncCall to call the PackedFunc, call MXNetFuncFree to free the PackedFunc, and call MXNetFuncCreateFromCFunc like TVMFuncCreateFromCFunc in TVM to create new PackedFunc.

However, when the implementations of std::function are different between MXNet and the application A, A could not call PackedFunc directly, namely

PackedFunc *func = GetAPackedFuncFromMXNet();
(*func)(a, b, c);

Besides, A could not pass its PackedFunc to MXNet directly, namely

PackedFunc func = GetAPackedFuncFromA();
MXNetFuncCall(&func, xxx);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wkcn Thanks for bringing this up. Yes, I agree that the above two cases trigger unexpected errors.

In the first case, PackedFunc is exposed to A, which is not intended. Also, in this pr, application A is basically cython. As can be seen from function.pxi and convert.pxi, PackedFunc is not exposed to cython. So the first case does not occur in this pr.

As for the second case, currently this FFI does not support receiving callback functions from outside of MXNet (MXNetFuncCreateFromCFunc does not exist). So the second case does not occur either.

Generally in this pr PackedFunc is called through the intended MXNetFuncCall, which ensures the ABI-compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hzfan I see. Thank you for the detailed explaination : )

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this PR ready for review?

@hzfan
Copy link
Contributor Author

hzfan commented Feb 17, 2020

@eric-haibin-lin Yes.

src/api/api_lang.cc Outdated Show resolved Hide resolved
@hzfan
Copy link
Contributor Author

hzfan commented Feb 19, 2020

Thank @leezu for review :)

include/mxnet/runtime/object.h Outdated Show resolved Hide resolved
include/mxnet/runtime/ffi_helper.h Outdated Show resolved Hide resolved
include/mxnet/runtime/ffi_helper.h Outdated Show resolved Hide resolved
python/mxnet/_ffi/_ctypes/types.py Outdated Show resolved Hide resolved
src/api/api_npi.cc Outdated Show resolved Hide resolved
src/api/api_npi.cc Outdated Show resolved Hide resolved
src/api/api_npi.cc Outdated Show resolved Hide resolved
@hzfan hzfan force-pushed the poc-tvmffi_pr branch 2 times, most recently from c5f640e to 14ca3c8 Compare February 23, 2020 18:03
@reminisce reminisce merged commit 3f0b049 into apache:master Feb 24, 2020
leezu added a commit that referenced this pull request Feb 25, 2020
leezu added a commit to leezu/mxnet that referenced this pull request Feb 25, 2020
@hzfan hzfan mentioned this pull request Feb 28, 2020
7 tasks
leezu pushed a commit that referenced this pull request Feb 28, 2020
MoisesHer pushed a commit to MoisesHer/incubator-mxnet that referenced this pull request Apr 10, 2020
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request May 29, 2020
* Init

* Add nop

* Add utility function SetInOut and Invoke

* Init ctypes

* Dispatch for default/CSR array

* Refactor, register the funcs where they are used, except for _api_internal.py, which is registered in api.py

* Seperate tvm ffi api and legacy api

* Replace legacy zeros with new

* Fix numpy.int64 in shape

* Fix sanity

* Fix

* Remove python2 support

* Cleanup

* Fix ci

* Fix lint

* Revert rand_shape_nd

* Fix clang-tidy

* Support NDArray in ctypes

* Using runtime

* Conversion ctor

* Tensordot

* Tensordot backward

* Fix nop regression

* Deprecate Array

* Fix comments

* Fix comments

* Add acknowledgement to incubator-tvm

* Refactor according to comments
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request May 29, 2020
sxjscience pushed a commit to sxjscience/mxnet that referenced this pull request Jul 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants