Graph Partition API #15886

mseth10 · 2019-08-14T01:55:34Z

Description

This PR introduces enhanced Python and C APIs to trigger graph partitioning for a given backend. These APIs provide an option to the user to infer shapes, dtypes and stypes before partitioning. They add the ability to pass optional arguments to subgraph property.

This PR makes the following modifications to subgraph property:

Added the ability for the subgraph property to preprocess the graph and initialize any custom state that is required during partitioning
Added the ability for the subgraph property to post-process the graph and do things like collect statistics on the quality of partitioning.
Enabled subgraph property to reject creating a sub-optimal subgraph. For example, the subgraph property could reject subgraphs with only one operator in it.

We reuse the existing graph partitioning algorithm and control using subgraph property classes. Since we’re modifying a symbol object, we add this API to the Symbol class. Users will call optimize_for on an existing symbol object to create a new partitioned symbol.

Here is an example showing partitioning without args, which means infer shape/type will NOT be called before partitioning. It is similar to GetBackendSymbol.

import mxnet as mx
sym, args, aux = mx.model.load_checkpoint('resnet-50', 0)
sym = sym.optimize_for('default')
exe = sym.bind(ctx=mx.cpu(), args=args, aux_states=aux, grad_req='null')

Here is an example showing partitioning with args, which means infer shape/type is called before partitioning.

import mxnet as mx
sym, args, aux = mx.model.load_checkpoint('resnet-50', 0)
sym = sym.optimize_for('default', args=args, ctx=mx.cpu())
exe = sym.bind(ctx=mx.cpu(), args=args, aux_states=aux, grad_req='null')

Here is an example showing partitioning with kwargs to set a list of ops to be excluded from subgraphs.

import mxnet as mx
sym, args, aux = mx.model.load_checkpoint('resnet-50', 0)
sym = sym.optimize_for('default', args=args, ctx=mx.cpu(), excluded_ops=['BatchNorm'])
exe = sym.bind(ctx=mx.cpu(), args=args, aux_states=aux, grad_req='null')

Next steps:

Add example(s) and test(s) for excluded_ops passed as arguments to optimize_for as part of the PR on support for custom subgraph properties
Add functionality for users to pass shape/type/stype dicts as kwargs to optimize_for just like we have it in simple_bind

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

pengzhao-intel · 2019-08-14T02:42:23Z

@ZhennanQin for review the compatible :)

ZhennanQin · 2019-08-14T04:10:17Z

What's the difference between sym.partition('default') and sym.get_backend_symbol('default')?

python/mxnet/symbol/symbol.py

samskalicky · 2019-08-15T06:42:16Z

@ZhennanQin Here in this PR we're trying to do the following things:

Clarify the API/function-names. Initially we wanted the API to be called "sym.partition" since we were going to partition the symbol. But given your latest subgraph API enhancements that enable multiple subgraph properties to be specified (as a subgraph backend) we dropped this idea. Instead, to users the outcome is that they are "optimizing the symbol, for a specific backend". So now we propose the API to be called "optimizeFor" and it will take the name of the backend. So this way it will be much more understandable to users that they are "optimizing the symbol for a specific backend". For example: sym.optimizeFor("TensorRT") or sym.optimizeFor("MKLDNN") or sym.optimizeFor("EIA"). We expect that users could call this more than once for the same model. For example, users could first optimize for EIA (that would group supported nodes into subgraphs) and then optimize for MKLDNN for the remaining nodes that were not partitioned into subgraphs.
We also want to enable users to pass more configuration options directly to the subgraph property to tailor the partitioning. One example is to provide a better experience for users to provide a blacklist (or list of operators to exclude from subgraphs). Another could be setting the minimum number of nodes allowed in a subgraph (ie. 3).
Enable the API to optionally perform shape/type propagation before partitioning. For example, some backends may only support operators if their inputs are a particular type (ie. FP16 or Int8) or shape (2D, 3D, or 4D).
Currently the subgraph properties are state-less. We want to enable the subgraph properties to have a prepartition and postpartition APIs to do setup or cleanup around the actual partitioning call. This could be used to setup a whitelist of supported operators, or pre-analyze the graph before partitioning begins and Select is called on the Selector.

In this PR we want to enable this functionality for users to more easily use partitioning. In a later PR we want to enable subgraph properties to be dynamically loaded from libraries.

python/mxnet/symbol/symbol.py

src/operator/subgraph/subgraph_property.h

ZhennanQin · 2019-08-16T06:46:25Z

@samskalicky Thanks for the explanation, it helps a lot to understand this PR. Basically, I think sym.optimizeFor('MKLDNN') is the same as current sym.get_backend_symbol('MKLDNN'), so I don't see the reason to duplicate it. Maybe we can directly rename get_backend_symbol to optimizeFor?

samskalicky · 2019-08-16T16:20:32Z

@ZhennanQin Thanks for your comments, we'll try and clarify our thought process. Apologies for the lack of clarity in this PR. You are correct in the current status of the code that @mseth10 has committed so far today. The PR is still WIP and more changes are coming, lets not make decisions on what is there today.

Instead lets focus on the items i mentioned before (they will get into the PR description before we attempt to merge, I promise! :-D). We will add more arguments to the optimizeFor API in the coming commits. We need to add arguments to enable us to to do shape/type propagation prior to partitioning (so we can use shape/type info to select ops in subgraph properties), and we want to accept arbitrary options that we pass to the subgraph property for further configuration (ie. blacklisted ops).

These new API changes will not be compatible with the current get_backend_symbol API. Since MXNet maintains semantic versioning for minor releases we cannot change the API yet. So instead we'll create another API (optimizeFor) along side get_backend_symbol for now.

samskalicky · 2019-08-16T17:55:27Z

@mseth10 we also need to add the ability to reject creating a subgraph in the buildsubgraph.cc file:

https://github.com/apache/incubator-mxnet/blob/e98fea3165670157090f2a2f644890452443803c/src/operator/subgraph/build_subgraph.cc#L574

We want to enable the subgraph property to return a null node to say reject creating a subgraph. The partitioning pass has a decycle feature that may removed nodes that were selected when calling the select function on the subgraph property. So the subgraph may be smaller than anticipated, and we want the ability to not create subgraphs based on some criteria in the subgraph property (ie. subgraph size too small).

python/mxnet/symbol/symbol.py

src/c_api/c_api_symbolic.cc

python/mxnet/symbol/symbol.py

src/common/exec_utils.h

ZhennanQin · 2019-08-19T00:25:48Z

@samskalicky @mseth10 As this PR is still WIP, please ping me when this PR is ready to review. I agree with you that we need a more powerful API to put dtype, shape and ctx into consideration to make partition more accurate.

…validation

samskalicky · 2019-08-31T15:13:53Z

@ZhennanQin can you please approve? @mseth10 has added the "Next steps" in the PR description to save the outcome of our discussions from the PR comments.

ZhennanQin

LGTM. Thanks for contributing!

samskalicky · 2019-09-03T00:30:51Z

@mxnet-label-bot remove [pr-awaiting-review]

samskalicky · 2019-09-03T00:31:29Z

@mxnet-label-bot add [pr-awaiting-merge]

pengzhao-intel

Thanks for great works on this :)

Merging now.

mseth10 requested a review from szha as a code owner August 14, 2019 01:55

samskalicky reviewed Aug 14, 2019

View reviewed changes

python/mxnet/symbol/symbol.py Outdated Show resolved Hide resolved

mseth10 force-pushed the partition_api branch 2 times, most recently from b6b5147 to 478c60a Compare August 14, 2019 23:47

mseth10 requested review from anirudh2290 and eric-haibin-lin as code owners August 14, 2019 23:47

mseth10 force-pushed the partition_api branch 3 times, most recently from 11944af to 3860dee Compare August 15, 2019 01:43

samskalicky reviewed Aug 15, 2019

View reviewed changes

python/mxnet/symbol/symbol.py Outdated Show resolved Hide resolved

mseth10 force-pushed the partition_api branch 2 times, most recently from 6fb556b to e1067da Compare August 16, 2019 02:15

samskalicky reviewed Aug 16, 2019

View reviewed changes

src/operator/subgraph/subgraph_property.h Outdated Show resolved Hide resolved

mseth10 force-pushed the partition_api branch from df649d2 to df67ed0 Compare August 16, 2019 18:17

rondogency reviewed Aug 16, 2019

View reviewed changes

python/mxnet/symbol/symbol.py Outdated Show resolved Hide resolved

mseth10 force-pushed the partition_api branch 4 times, most recently from dfeeacf to 6082f41 Compare August 17, 2019 00:31

samskalicky reviewed Aug 17, 2019

View reviewed changes

src/c_api/c_api_symbolic.cc Outdated Show resolved Hide resolved

samskalicky reviewed Aug 17, 2019

View reviewed changes

src/c_api/c_api_symbolic.cc Outdated Show resolved Hide resolved

samskalicky reviewed Aug 17, 2019

View reviewed changes

python/mxnet/symbol/symbol.py Outdated Show resolved Hide resolved

samskalicky reviewed Aug 17, 2019

View reviewed changes

src/common/exec_utils.h Outdated Show resolved Hide resolved

mseth10 and others added 16 commits August 30, 2019 08:34

refactoring to enable infer shape/type without storage type

03d1f09

check if subgraph rejected by subgraph property

ff30b53

adding description

8ec7276

setting graph attribute context from args

8172101

adding unit test for optimize_for with default backend

bd567a3

fixing args access

7a095ad

removing options_map from PostPartition

323e53b

addressing PR comment

e5b68a9

adding logs about status of subgraph node creation

338f063

allowing partial infer shapes

a5d409f

added context argument back to optimize_for and removed args context …

663210f

…validation

fixed spacing and dev_type

cb1dc28

fixing lint

8fba76c

reorganized args list to optimize_for

4f00222

fixing spacing

e8428ef

dereferencing dev_type

da8f6bf

mseth10 force-pushed the partition_api branch from f44e64a to da8f6bf Compare August 30, 2019 08:34

ZhennanQin approved these changes Sep 2, 2019

View reviewed changes

marcoabreu removed the pr-awaiting-review PR is waiting for code review label Sep 3, 2019

marcoabreu added the pr-awaiting-merge Review and CI is complete. Ready to Merge label Sep 3, 2019

pengzhao-intel approved these changes Sep 3, 2019

View reviewed changes

pengzhao-intel merged commit 692f3c4 into apache:master Sep 3, 2019

samskalicky mentioned this pull request Dec 13, 2019

Dynamic subgraph property #17034

Merged

4 tasks

mseth10 mentioned this pull request Feb 6, 2020

[RFC] Partitioning for a given backend #17532

Closed

samskalicky mentioned this pull request Feb 19, 2020

Dynamic subgraph compile support #17623

Merged

5 tasks

wkcn mentioned this pull request May 9, 2020

[MXNet Extensions] Include lib_api.h in the pre-built pip package #18267

Open

mseth10 deleted the partition_api branch June 1, 2020 10:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph Partition API #15886

Graph Partition API #15886

mseth10 commented Aug 14, 2019 •

edited

Loading

pengzhao-intel commented Aug 14, 2019

ZhennanQin commented Aug 14, 2019

samskalicky commented Aug 15, 2019 •

edited

Loading

ZhennanQin commented Aug 16, 2019

samskalicky commented Aug 16, 2019 •

edited

Loading

samskalicky commented Aug 16, 2019 •

edited

Loading

ZhennanQin commented Aug 19, 2019

samskalicky commented Aug 31, 2019

ZhennanQin left a comment

samskalicky commented Sep 3, 2019

samskalicky commented Sep 3, 2019

pengzhao-intel left a comment

Graph Partition API #15886

Graph Partition API #15886

Conversation

mseth10 commented Aug 14, 2019 • edited Loading

Description

Checklist

Essentials

pengzhao-intel commented Aug 14, 2019

ZhennanQin commented Aug 14, 2019

samskalicky commented Aug 15, 2019 • edited Loading

ZhennanQin commented Aug 16, 2019

samskalicky commented Aug 16, 2019 • edited Loading

samskalicky commented Aug 16, 2019 • edited Loading

ZhennanQin commented Aug 19, 2019

samskalicky commented Aug 31, 2019

ZhennanQin left a comment

Choose a reason for hiding this comment

samskalicky commented Sep 3, 2019

samskalicky commented Sep 3, 2019

pengzhao-intel left a comment

Choose a reason for hiding this comment

mseth10 commented Aug 14, 2019 •

edited

Loading

samskalicky commented Aug 15, 2019 •

edited

Loading

samskalicky commented Aug 16, 2019 •

edited

Loading

samskalicky commented Aug 16, 2019 •

edited

Loading