Dynamic subgraph compile support (apache#17623)

This PR adds support for passing the NDArrays from the existing optimize_for API down to the reviewSubgraph function in an external library. It also adds a new API for HybridBlock called optimize_for that can partition the model without running a forward pass. Feature changes Adds new API to HybridBlock optimize_for that partitions the model but does not call the cachedOp Modifies the subgraph library example to optionally require args to be provided Adds annotation on subgraph inputs for the name of the original param so that inputs can be mapped and passes annotations to input nodes of subgraphs Adds support for tensors in MKLDNN format, calls Reorder2Default New tests Adds a new test to partition operators that directly consume params add a new model to test where ops to be partitioned have args/params Bug Fixes fixes bug in passing ids vector by value instead of by reference fixes bug in passing copies of attributes instead of by reference fixes bug where _cached_graph was not updated after partitioning fixes memory leak where user-specified attributes on subgraph ops were not freed if subgraph was rejected fixes problem incorrectly indexing into shape/dtype maps when annotating the graph Docs Updates the README doc with the latest changes described above
samskalicky · Apr 15, 2020 · 6b950bc · 6b950bc
1 parent 13f5ad9
commit 6b950bc
Show file tree

Hide file tree

Showing 15 changed files with 657 additions and 131 deletions.
diff --git a/example/extensions/lib_subgraph/README.md b/example/extensions/lib_subgraph/README.md
@@ -53,9 +53,11 @@ You can start getting familiar with custom partitioners by running an example pr
 
 * **lib_subgraph/test_subgraph.py**: This file calls `mx.library.load(‘libsubgraph_lib.so’)` to load the library containing the custom components, partitions the model using the `optimize_for` API, and prints outputs of the forward passes. The outputs should be the same as the regular MXNet forward pass without partitioning.
 
+* **include/mxnet/lib_api.h**: This file from MXNet source code is the single header file needed to include all necessary data types and function prototypes for writing a custom operator library. You can either specify the include path in the `Makefile`, or copy the header file over to `example/extensions/lib_subgraph` folder. Note that apart from this header, the custom operator library is independent of MXNet source.
+
 ## Writing Custom Partitioner Library
 
-For building a library containing your own custom partitioner, compose a C++ source file like `mypart_lib.cc`, include `lib_api.h` header file, and write your custom partitioner with these essential functions:
+To build your own library containing a custom partitioner, compose a C++ source file like `mypart_lib.cc`, include `lib_api.h` header file, and write your custom partitioner with these essential functions:
 - `initialize` - Library Initialization Function
 - `REGISTER_PARTITIONER ` - Partitioner Registration Macro
 - `mySupportedOps ` - Operator Support
@@ -76,34 +78,60 @@ sym, _, _ = mx.model.load_checkpoint('mymodel', 0)
 # Symbol/Module flow
 sym2 = sym.optimize_for("myPart")
 
-# Gluon flow
+# Gluon flow 1
 sym_block = nn.SymbolBlock(sym, inputs)
 sym_block.hybridize(backend='myPart')
+
+# Gluon flow 2
+sym_block = nn.SymbolBlock(sym, inputs)
+sym_block.optimize_for(x, backend='myPart')
 ```
 
+In the Gluon hybridize flow, the model is actually hybridized during the first inference, rather than immediately when calling `hybridize`. This hybridize-based flow is useful if a user expects to run inference immediately after hybridizing. But for users than just want to partition but not run a whole forward pass, the `optimize_for` API combines the hybrdize/forward APIs but does not run a forward pass. After calling `optimize_for` users can `export` their model immediately without running a forward pass. 
+
 ### Using a Custom Partitioner Library
 
 Partitioning APIs in MXNet are available in both Symbol and Gluon APIs. For the Symbol API, the `optimize_for` API can be called on Symbol objects to return a partitioned Symbol.
 
 ```
-optimize_for(backend, args=None, ctx=None, **kwargs)
+optimize_for(backend, args=None, aux=None, ctx=None, **kwargs)
 ```
 
-The `optimize_for` API takes at least 1 argument, `backend` which is a string that identifies which backend to partition the model for. The `args` argument is optional and takes a list of NDArray or dict of str to NDArray. It is used to infer shapes and types and before partitioning. The `ctx` argument is optional and takes a device context to infer storage types. It also take any other user-specified options that will be passed to the backend partitioning APIs.
+The `optimize_for` API takes at least 1 argument, `backend` which is a string that identifies which backend to partition the model for. The `args` and `aux` arguments are optional and take a list of NDArray or dict of str to NDArray. They are used to infer shapes and types and before partitioning, and passed to the backend to use during compilation. The `ctx` argument is optional and takes a device context to infer storage types. It also takes any other user-specified options that will be passed to the backend partitioning APIs.
 
 For the Gluon API, the `hybridize` API can be called on HybridBlocks to partition the internal CachedOp Symbol.
 
 ```
-hybridize(backend=None, backend_opts=None)
+hybridize(backend=None, backend_opts=None, **kwargs)
+```
+
+The `hybridize` function prepares the HybridBlock to be converted into a backend symbol. The `backend` argument is a string that identifies which backend that will partition the model. The `backend_opts` takes other user-specified options that will be passed to the backend partitioning APIs. The actual partitioning takes place during the forward pass.
+
+If you just want to partition the HybridBlock but not run a complete forward pass, you can use the `optimize_for` API that combines the work done in the `hybridize` API with part of the work done in the forward pass.
+
+```
+optimize_for(x, backend=None, backend_opts=None, **kwargs)
+```
+
+When the `optimize_for` API is called on a HybridBlock it partitions immediately. This lets users export the partitioned model without running a complete forward pass.
+
+```
+block.optimize_for(x, backend='myPart')
+block.export('partitioned')
 ```
 
-When the `hybridize` function is called, Gluon will convert the program’s execution into the style used in symbolic programming. The `backend` argument is a string that identifies which backend to partition the model for. The `backend_opts` takes other user-specified options that will be passed to the backend partitioning APIs.
+But you can also use `optimize_for` in place of `hybridize` and run inference immediately after too.
+
+```
+block.optimize_for(x, backend='myPart')
+block(x)
+```
 
 ### Writing A Custom Partitioner
 
 There are several essential building blocks for making a custom partitioner:
 
-* [initialize](./subgraph_lib.cc#L242):
+* [initialize](./subgraph_lib.cc#L261):
     * This function is the library initialization function necessary for any dynamic libraries. It lets you check if the user is using a compatible version of MXNet. Note that this `version` parameter is passed from MXNet when library is loaded.
 
             MXReturnValue initialize(int version)
@@ -116,40 +144,37 @@ There are several essential building blocks for making a custom partitioner:
                 std::vector<bool>& ids,
                 std::unordered_map<std::string, std::string>& options)
 
-* [REGISTER_PARTITIONER(my_part_name)](./subgraph_lib.cc#L238):
+* [REGISTER_PARTITIONER(my_part_name)](./subgraph_lib.cc#L257):
     * This macro registers the custom partitioner and its properties to MXNet by its name. Notice that a partitioner can have multiple partitioning strategies. This enables multiple *passes* to be run in a single partitioning call from the user. The first argument to `addStrategy` is a user-specified name. The second argument is the `supportedOps` function. The third argument is the name of the subgraph operator to create for each subgraph created during partitioning (see below for more info about subgraph operators). The `setReviewSubgraph` API registers a callback function that is called for each subgraph created during partitioning (more on this below). Notice that the first argument to this function is the strategy to associate with and the second argument is the `reviewSubgraph` function.
 
             REGISTER_PARTITIONER(my_part_name)
-            .addStrategy("strategy1", 
-                          supportedOps, 
-                          "_custom_subgraph_op")
-            .setReviewSubgraph("strategy1", 
-                                reviewSubgraph);
+            .addStrategy("strategy1", supportedOps, "_custom_subgraph_op")
+            .setReviewSubgraph("strategy1", reviewSubgraph);
 
 
 Also there are some optional functions you can specify:
 
-* [reviewSubgraph](./subgraph_lib.cc#L220):
+* [reviewSubgraph](./subgraph_lib.cc#L219):
     * This function provides an opportunity to accept/reject a subgraph after MXNet partitions it. It also allows specifying custom attributes on the subgraph (ie. user-generated IDs). If you do not register this function, subgraphs will be accepted by default. 
 
             MXReturnValue reviewSubgraph(
                 std::string json,
-                int subraph_id,
+                int subgraph_id,
                 bool* accept,
-                std::unordered_map<std::string, 
-                                   std::string>& options,
-                std::unordered_map<std::string, 
-                                   std::string>& attrs)
+                std::unordered_map<std::string, std::string>& options,
+                std::unordered_map<std::string, std::string>& attrs,
+                std::map<std::string, MXTensor>& args,
+                std::map<std::string, MXTensor>& aux)
 
 Let’s take a closer look at those registry functions:
 
-* **supportedOps**: This function takes four arguments. The 1st argument is a JSON string of the model architecture graph, where nodes are inputs/params/weights and edges are data dependencies. The graph is pre-sorted in topological order. The 2nd argument is an array of booleans, one for each operator in the model. When traversing the graph, operators to be partitioned into subgraphs are identified and an entry is set to `true` for the node ID in the `ids` array. The last argument is the map of options specified by the user. Users can pass custom options to the partitioner and they are passed to this function in the `options` map. 
+* **supportedOps**: This function takes four arguments. The 1st argument is a JSON string of the model architecture graph, where nodes are inputs/params/weights and edges are data dependencies. The graph is pre-sorted in topological order. The 2nd argument is an array of booleans, one for each operator in the model. When traversing the graph, operators to be partitioned into subgraphs are identified and an entry is set to `true` for the index in the `ids` array corresponding to the node ID. The last argument is the map of options specified by the user. Users can pass custom options to the partitioner and they are passed to this function in the `options` map. 
 
-* **reviewSubgraph**: This function takes five arguments. The 1st argument is a JSON string of the newly partitioned subgraph. The 2nd argument is the subgraph ID, this is just a number MXNet uses to identify this particular subgraph (it starts at zero and increments). The 3rd argument is an output to be set in this function to tell MXNet whether to accept (value: `true`) or reject (value: `false`) the subgraph. The 4th argument is the map of options specified by the user. The last argument is a map of attributes that should be set on the created subgraph. These attributes will be available later at runtime, and provides a mechanisn to pass info from partition-time to runtime. You might want to reject a subgraph if it doesnt include all the operators you want, for example. The `options` map is the same one passed to the `supportedOps` API.
+* **reviewSubgraph**: This function takes five arguments. The 1st argument is a JSON string of the newly partitioned subgraph. The 2nd argument is the subgraph ID, this is just a number MXNet uses to identify this particular subgraph (it starts at zero and increments, unique for each subgraph in the model). The 3rd argument is an output to be set in this function to tell MXNet whether to accept (value: `true`) or reject (value: `false`) the subgraph. You might want to reject a subgraph if it doesnt include all the operators you want, for example. The `options` map is the same one passed to the `supportedOps` API. The 4th argument is the map of options specified by the user. The 5th argument is a map of attributes that should be set on the created subgraph. These attributes will be available later at runtime, and provides a mechanisn to pass info from partition-time to runtime. The last argument is the map of params/weights/args to the model and the associated names. For inputs the the subgraph that come directly from the params/weights of the model, you can look up the name of the input in this map to get the actual tensor values.
 
 ### Writing A Custom Subgraph Operator
 
-A partitioning strategy specifies how to partition a model and isolate operators into subgraphs. In MXNet, subgraphs are just a [stateful operator](../lib_custom_op#writing-stateful-custom-operator). Subgraph operators have an extra attribute called `SUBGRAPH_SYM_JSON` that maps to a JSON string of the subgraph. The expectation is that when a subgraph operator executes a forward/backward call, it executes all of the operators in the subgraph. 
+A partitioning strategy specifies how to partition a model and isolate operators into subgraphs. In MXNet, subgraphs are just a [stateful operator](../lib_custom_op#writing-stateful-custom-operator). Subgraph operators have an extra attribute called `MX_STR_SUBGRAPH_SYM_JSON` that maps to a JSON string of the subgraph. The expectation is that when a subgraph operator executes a forward/backward call, it executes all of the operators in the subgraph. 
 
 When registering a custom subgraph operator, all thats needed is to register a `createOpState` function and to set that the operator is a subgraph operator by calling the `setIsSubgraphOp` API like:
 

diff --git a/example/extensions/lib_subgraph/subgraph_lib.cc b/example/extensions/lib_subgraph/subgraph_lib.cc
@@ -160,11 +160,11 @@ MXReturnValue createOpState(std::map<std::string, std::string> attrs,
   std::string serialized_subgraph = "[empty]";
   // MXNet subgraph is stored as Symbol in operator node attrs subgraphs field
   // custom subgraph is stored as json string in custom operator attrs map entry
-  if (attrs.count(SUBGRAPH_SYM_JSON)) {
+  if (attrs.count(MX_STR_SUBGRAPH_SYM_JSON)) {
     // user can now parse json and run other custom ops inside subgraph
-    serialized_subgraph = attrs[SUBGRAPH_SYM_JSON];
+    serialized_subgraph = attrs[MX_STR_SUBGRAPH_SYM_JSON];
   }
-  attrs.erase(SUBGRAPH_SYM_JSON);
+  attrs.erase(MX_STR_SUBGRAPH_SYM_JSON);
   *op_inst = new MyStatefulOp(serialized_subgraph, attrs);
   std::cout << "Info: stateful operator created" << std::endl;
   return MX_SUCCESS;
@@ -177,7 +177,7 @@ REGISTER_OP(_custom_subgraph_op)
 const std::vector<std::string> op_names({"exp","log"});
 
 MXReturnValue mySupportedOps(std::string json,
-                             std::vector<bool> ids,
+                             std::vector<bool>& ids,
                              std::unordered_map<std::string, std::string>& options) {
   for (auto kv : options) {
     std::cout << "option: " << kv.first << " ==> " << kv.second << std::endl;
@@ -204,8 +204,8 @@ MXReturnValue mySupportedOps(std::string json,
         dtype = std::stoi(attrs.map[JsonVal("dtype")].str);
     }
 
-    //check if op dtype is float
-    if(dtype == kFloat32) {
+    //check if op dtype is float, and if option was specified to require float types
+    if((dtype == kFloat32 && options.count("reqFloat") > 0) || options.count("reqFloat") == 0) {
       //check if op is in whitelist
       if(std::find(op_names.begin(),op_names.end(),op.str.c_str()) != op_names.end()) {
         // found op in whitelist, set value to 1 to include op in subgraph
@@ -216,22 +216,41 @@ MXReturnValue mySupportedOps(std::string json,
   return MX_SUCCESS;
 }
 
-MXReturnValue myReviewSubgraph(std::string json, int subraph_id, bool* accept,
+MXReturnValue myReviewSubgraph(std::string json, int subgraph_id, bool* accept,
                                std::unordered_map<std::string, std::string>& options,
-                               std::unordered_map<std::string, std::string>& attrs) {
+                               std::unordered_map<std::string, std::string>& attrs,
+                               std::map<std::string, MXTensor>& args,
+                               std::map<std::string, MXTensor>& aux) {
   for (auto kv : options) {
     std::cout << "option: " << kv.first << " ==> " << kv.second << std::endl;
   }
-  if(options.find("reject") != options.end() &&
-     options["reject"].compare("True") == 0) {
+  for (auto kv : args) {
+    std::cout << "arg: " << kv.first << " ==> (";
+    for (auto s : kv.second.shape)
+      std::cout << s << ",";
+    std::cout << ") [";
+    for (int i=0; i<kv.second.size(); i++)
+      std::cout << kv.second.data<float>()[i] << ", ";
+    std::cout << "]" << std::endl;
+  }
+
+  // check if option `reqArgs` was specified, and if so check if args were provided
+  if(options.count("reqArgs") > 0 && args.size() == 0) {
+    *accept = false;
+    std::cout << "rejecting subgraph since args were not provided" << std::endl;
+    return MX_SUCCESS;
+  }
+
+  // check if option `reject` was specified, and if so check if value is 'True'
+  if(options.count("reject") > 0 && options["reject"].compare("True") == 0) {
+    // if specified, reject the subgraph. this is only used for testing
     *accept = false;
     std::cout << "rejecting subgraph" << std::endl;
   } else {
     *accept = true;
     std::cout << "accepting subgraph" << std::endl;
     attrs["myKey"] = "myVal";
   }
-  std::cout << json << std::endl;
   return MX_SUCCESS;
 }
 

diff --git a/example/extensions/lib_subgraph/test_subgraph.py b/example/extensions/lib_subgraph/test_subgraph.py
@@ -23,8 +23,10 @@
 # This test checks if dynamic loading of library into MXNet is successful
 # and checks the end of end computation of custom operator
 
-import mxnet as mx
 import os, ctypes
+import mxnet as mx
+from mxnet.gluon import nn
+from mxnet import nd
 from mxnet.base import _LIB, check_call, mx_uint, c_str, c_str_array, SymbolHandle
 
 # load library
@@ -35,6 +37,10 @@
     path = os.path.abspath('libsubgraph_lib.dll')
     mx.library.load(path)
 
+###############################################
+# Test with subgraph not consuming params
+###############################################
+# example model, ops to be partitioned do not have args (use outputs from other ops as inputs)
 a = mx.sym.var('a')
 b = mx.sym.var('b')
 c = a + b
@@ -75,9 +81,6 @@
 out3 = exe3.forward()
 print(out3)
 
-from mxnet.gluon import nn
-from mxnet import nd
-
 # Gluon Hybridize partitioning with shapes/types
 print('-------------------------------')
 print('Testing Gluon Hybridize partitioning with shapes/types')
@@ -88,3 +91,54 @@
 out4 = sym_block(mx.nd.ones((3,2)),mx.nd.ones((3,2)))
 print(out4)
 
+# Gluon Hybridize partitioning with shapes/types without inference
+print('-------------------------------')
+print('Testing Gluon Hybridize partitioning with shapes/types without inference')
+inputs = [a,b]
+sym_block2 = nn.SymbolBlock(sym, inputs)
+sym_block2.initialize()
+sym_block2.optimize_for(mx.nd.ones((3,2)), mx.nd.ones((3,2)), backend='myProp')
+sym_block2.export('partitioned')
+
+
+###############################################
+# Test with subgraph directly consuming params
+###############################################
+# example model, ops to be partitioned have args
+d2 = mx.sym.exp(a)
+sym2 = mx.sym.log(d2)
+
+#execute in MXNet
+print('-------------------------------')
+print('Testing regular MXNet execution')
+exe5 = sym2.bind(ctx=mx.cpu(), args={'a':mx.nd.ones((3,2))})
+out5 = exe5.forward()
+print(out5)
+
+# with propogating shapes/types
+print('-------------------------------')
+print('Testing partitioning with shapes/types')
+arg_array = [mx.nd.ones((3,2),dtype='float32')]
+mysym6 = sym2.optimize_for("myProp", arg_array, reqArgs=True)
+print(mysym6.tojson())
+exe6 = mysym6.bind(ctx=mx.cpu(), args={'a':mx.nd.ones((3,2))})
+out6 = exe6.forward()
+print(out6)
+
+# without propogating shapes/types
+print('-------------------------------')
+print('Testing partitioning without shapes/types')
+mysym7 = sym2.optimize_for("myProp", reqArgs=True)
+exe7 = mysym7.bind(ctx=mx.cpu(), args={'a':mx.nd.ones((3,2))})
+out7 = exe7.forward()
+print(out7)
+
+# Gluon Hybridize partitioning with shapes/types
+print('-------------------------------')
+print('Testing Gluon Hybridize partitioning with shapes/types')
+inputs = [a]
+sym2_block = nn.SymbolBlock(sym2, inputs)
+sym2_block.initialize()
+sym2_block.hybridize(backend='myProp')
+out8 = sym2_block(mx.nd.ones((3,2)))
+print(out8)
diff --git a/include/mxnet/c_api.h b/include/mxnet/c_api.h
@@ -2170,8 +2170,10 @@ MXNET_DLL int MXOptimizeForBackend(SymbolHandle sym_handle,
                                    const char* backend_name,
                                    const int dev_type,
                                    SymbolHandle* ret_sym_handle,
-                                   const mx_uint len,
+                                   const mx_uint args_len,
                                    NDArrayHandle* in_args_handle,
+                                   const mx_uint aux_len,
+                                   NDArrayHandle* in_aux_handle,
                                    const mx_uint num_options,
                                    const char** keys,
                                    const char** vals);