Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Subgraph API creates duplicate inputs and outputs #16108

Closed
samskalicky opened this issue Sep 6, 2019 · 5 comments · Fixed by #16131
Closed

Subgraph API creates duplicate inputs and outputs #16108

samskalicky opened this issue Sep 6, 2019 · 5 comments · Fixed by #16131
Assignees
Labels
Backend Issues related to the backend of MXNet Memory Performance

Comments

@samskalicky
Copy link
Contributor

samskalicky commented Sep 6, 2019

Description

Subgraph API can create duplicated tensors for both inputs to a subgraph and outputs from a subgraph. This happens when there are multiple nodes in a subgraph that consume the same input, or when there are multiple nodes that consume a single subgraph output. This results in tensor duplication, and causes OOM due to excessive memory usage.

Minimum reproducible example

Consider the duplicate input problem first. Lets say we have the following graph:

A --> B
  \
    --> C

In this graph node A has a single output, and nodes B and C consume A's output. If nodes B and C are added to a subgraph, the graph is changed to the following:

A --> S

S [ A0 --> B
     A1 --> C]

Heres, nodes B and C are moved into the subgraph, and their dependency on node A (which is not in the subgraph) is broken, and an input node is created for each: A0, A1. The subgraph will have 2 inputs (A0, A1) and will both have dependencies to node A.

Now lets consider the the duplicate output problem. Consider the same graph as above:

A --> B
  \
    --> C

But this time lets only node A is added to the subgraph, and nodes B and C are left out. Then the graph is changed to the following:

S --> B
  \
    --> C

S [ A]

Here, B and C have dependencies on the output from A inside the subgraph. In this case, the subgraph will have 2 outputs, one for each consumer: B and C. But both outputs will be defined as coming from the same output of node A inside the subgraph.

@HahTK
Copy link

HahTK commented Sep 6, 2019

Two things to add :

  1. A temporary fix has been tested for fixing the output duplication. (code below)
    This fix fits things into the current function signatures and had to redo similar computation in 2 spots
  2. A better fix would be to introduce a map for output_entries and input_entries to allow 1-many and many-1 mappings between the subgraph and the maingraph. Conceptually, similar with the fix in (1)

Code :
src/operator/subgraph/partition_graph.cc:CreateSubgraphNode()

  nnvm::Symbol sym;
  nnvm::NodeEntryEqual node_equal;
  size_t idx = 0;
  sym.outputs.resize(output_entries.size());

  //only add unique output_entries to sym.outputs
  //relies on output_entries being pre-sorted
  for (size_t i = 0; i < output_entries.size(); ++i) {
    if (0 == i) {
      sym.outputs[idx] = *output_entries[i];
    } else {
      if (!node_equal(*output_entries[i-1], *output_entries[i])) {
        idx++;
        sym.outputs[idx] = *output_entries[i];
      } //else skip over dupe entry
    }
  }
  sym.outputs.resize(idx+1);

In src/operator/subgraph/subgraph_property.h : ConnectSubgraphOutputs()

    nnvm::NodeEntryEqual node_equal;
    nnvm::NodeEntry prevNodeEntry;
    uint32_t idx = 0;

    //increment NodeEntry index only if the output_entry is unique
    for (size_t i = 0; i < output_entries->size(); ++i) {
      if (0 != i ) {
        if (!node_equal(prevNodeEntry, *output_entries->at(i))) {
          idx++;
        }
      }
      prevNodeEntry = *output_entries->at(i); //need a copy
      *output_entries->at(i) = nnvm::NodeEntry{subgraph_node, idx, 0};
    }

@ZhennanQin
Copy link
Contributor

We found the same issue when implementing MKLDNN subgraph backend. To address this, we introduced 2 APIs: ConnectSubgraphOutputs and ConnectSubgraphInputs, which can reduce the duplicated inputs and outputs. Subgraph backend developer have to manually use those 2 APIs to handle this case by case, which isn't quite convenient. It's a good improvement if subgraph pass can handle this automatically.

@zachgk
Copy link
Contributor

zachgk commented Sep 16, 2019

@mxnet-label-bot add [Performance]

@leezu
Copy link
Contributor

leezu commented Sep 10, 2020

Do we still need the ConnectSubgraphOutputs and ConnectSubgraphInputs now?

@samskalicky
Copy link
Contributor Author

Do we still need the ConnectSubgraphOutputs and ConnectSubgraphInputs now?

good question. we'll need to go look at the various subgraph properties and see what they are doing in those functions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backend Issues related to the backend of MXNet Memory Performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants