[AMP][Pass][Typing] Add faster type inference #9735

AndrewZhaoLuo · 2021-12-14T00:59:23Z

This PR adds a faster type inference pass which specifically is designed for the Automatic Mixed Precision Pass (AMP). The issue with AMP pass is it uses the existing type inference infrastructure extensively but existing type inference is not designed for the AMP workload.

AMP works by topologically going through the expression graph, replacing nodes with casted versions and using type inference extensively to do this. However, in order to use the type inference we must, for every subgraph build an IRModule and run type inference. The current type inference ignores previously populated type information and essentially repopulates the type fields of the subgraph we are examining. In a situation with N nodes arranged in a linear fashion, for AMP pass we will have N subgraphs we examine. For the i-th subgraph we have i nodes which IRModule and type inference will touch. This essentially means we have O(N^2) runtime at least which is bad.

The key issues are therefore:

Type inference needs an IRModule which touches all nodes in a graph
Type inference looks at all nodes in a graph and does not reuse information

The solution I came up with is a bit of a hack that let's me avoid rewriting the Type Inference pass (which is super essential and would take a long time to change). Essentially given an expression graph with partially populated type information, we can, given a subgraph, very easily construct an analogous graph which has the same type; we just need to replace nodes with known type information with a constant or variable expression. Doing this means if we are only interested in the type of a single node, we can extract a smaller subgraph with all the needed information to infer type. We then build an IRModule and run standard type inference on this much smaller subgraph.

This has 100x reduction in the AMP pass runtime. arcfaceresnet100 on a 2020 m1 macbook pro went from 20s --> 0.2s for example.

AndrewZhaoLuo · 2021-12-15T18:32:23Z

Discussed @jroesch and @mbs-octoml, main changes we want to do is change the name "Fast" --> "Local" and better documentating pre-conditions.

AndrewZhaoLuo · 2021-12-15T19:04:41Z

This is now ready for review

mbs-octoml

Never looked at to_mixed_precision.cc before but boy do I see why this would help!
Just some nits, thanks, pretty sure this is going to get more use.

mbs-octoml · 2021-12-16T23:55:54Z

src/relay/transforms/to_mixed_precision.cc

-      return mod->Lookup("main").as<FunctionNode>()->body->checked_type();
+    Type checked_type = expr->checked_type_;
+    if (checked_type.defined()) {
+      return checked_type;


// The expression has not been changed AND it's existing type
// is known to still be valid. (See special handling for tuples etc
// below for where we null out checked_type_ when we can not
// sure it is still valid.

(though see my comment below)

mbs-octoml · 2021-12-16T23:58:11Z

src/relay/transforms/to_mixed_precision.cc

@@ -381,6 +381,18 @@ class MixedPrecisionPass : public MixedModeMutator {
    return Call(cur_op, new_args, pre_call_node->attrs, new_arg_types, pre_call_node->span);
  }

+  Expr Rewrite_(const TupleGetItemNode* pre, const Expr& post) {
+    // The old checked type in the expression may not be valid so clear it
+    post->checked_type_ = Type(nullptr);


am I missing something or will checked_type_ = null iff some sub-expression of post has been rewritten and thus it's type has changed?
ie checked_type_ is non-null only if pre == post.get() ??

Hmm so you would think so, but it looks like the mutator does not by default invalidate the checked_type (and appears to reuse the reference? giving us this problem).

I can dig a little deeper, but if I remove this line for TupleGetItemNode the checked type will be wrong (it will be fp32 instead of fp16)

https://github.com/apache/tvm/blob/main/src/relay/ir/expr_functor.cc#L248

Here is the behavior for generating post, there is some Copy on write stuff which i don't quite understand the full mechanics of so 🤷

Ah! It's the COW, that makes sense. I think that means we should be clearing checked_type_ on COW but let's not dig ourselves any deeper until we've thought about incremental type inference a bit more carefully.

mbs-octoml · 2021-12-16T23:58:44Z

src/relay/transforms/type_infer.cc

@@ -824,8 +824,107 @@ void AddGlobalTypes(IRModule mod) {
  }
 }

+class SameTypedSubgraphExtractor : public ExprMutator {
+  /*


micro nit: move to before class, used /*! etc.

nit: Returns the largest sub-graph who's inner nodes need types and leaves are vars standing in
for already typed sub-expressions.

mbs-octoml · 2021-12-16T23:59:30Z

src/relay/transforms/type_infer.cc

+  }
+
+ private:
+  Expr get_analogous_expression(const Expr& expr) {


nit: GetAnalogousExpression

mbs-octoml · 2021-12-17T00:01:12Z

src/relay/transforms/type_infer.cc

+      return VisitExpr(expr);
+    }
+
+    return Var("dummy_var", expr->checked_type(), expr->span);


// Since the expression already has a checked_type which we trust we don't need
// full type inference to enter it. So stub it out with a dummy var of the same type.

AndrewZhaoLuo · 2021-12-17T19:13:36Z

Was trying to play around with replacing some type inference in dynamic_to_static pass and ran into some small errors related to funcnodes, so I'm going to add a few basic tests

AndrewZhaoLuo · 2021-12-17T23:03:09Z

Added (or rather replaced) some tests. PTAL @mbs-octoml

FrozenGene · 2022-01-04T05:18:55Z

@AndrewZhaoLuo Sorry for later reply. Does this help us to solve ADT problem in our MixedPrecision? Let us imagine we have one if node in our relay graph, if will be converted two subgraphs mentioned by you in this pr. For example:

fn main():
    let %1 = xxx;
    let %2 = if (%1) {
    let %3: = @func___inference_a(%4, %5, %6) 
  } else {
    let %7: = @func___inference_b(%8, %9)
  };

Then we have two subgraph func___inference_a and func___inference_b. Does this help us to make our two subgraph type infer correctly? As I see you have supported GlobalVarNode.

AndrewZhaoLuo · 2022-01-07T04:30:31Z

@AndrewZhaoLuo Sorry for later reply. Does this help us to solve ADT problem in our MixedPrecision? Let us imagine we have one if node in our relay graph, if will be converted two subgraphs mentioned by you in this pr. For example:
fn main():
    let %1 = xxx;
    let %2 = if (%1) {
    let %3: = @func___inference_a(%4, %5, %6) 
  } else {
    let %7: = @func___inference_b(%8, %9)
  };  
Then we have two subgraph func___inference_a and func___inference_b. Does this help us to make our two subgraph type infer correctly? As I see you have supported GlobalVarNode.

@FrozenGene not sure if I understand the concern 😅, global var nodes are just used to reference function calls right? These functions have a known type ahead of time right?

* reuse checked types * analogous subgraph * brr go fast * clean up src logs * clean up PR more * more clean up * more documenetation * clean up * formatting * rename fast --> local * more ocmments * jostle ci * type inference * change comment for SameTypedSubgraphExtractor * get_analogous_expression -> GetAnalogousExpression * comment in GetAnaalogousExpression * add comment * replace infer tests * jostle

FrozenGene · 2022-01-08T13:31:11Z

@AndrewZhaoLuo Sorry for later reply. Does this help us to solve ADT problem in our MixedPrecision? Let us imagine we have one if node in our relay graph, if will be converted two subgraphs mentioned by you in this pr. For example:
fn main():
    let %1 = xxx;
    let %2 = if (%1) {
    let %3: = @func___inference_a(%4, %5, %6) 
  } else {
    let %7: = @func___inference_b(%8, %9)
  };  
Then we have two subgraph func___inference_a and func___inference_b. Does this help us to make our two subgraph type infer correctly? As I see you have supported GlobalVarNode.
@FrozenGene not sure if I understand the concern 😅, global var nodes are just used to reference function calls right? These functions have a known type ahead of time right?

@AndrewZhaoLuo Yes. In fact I saw your pr support global var node, I thought you will leverage it to solve this undo: https://github.com/apache/tvm/blob/main/src/relay/transforms/to_mixed_precision.cc#L297

AndrewZhaoLuo · 2022-01-10T18:08:45Z

@FrozenGene ah yes, so the type inference will work, but need to think about how to handle it properly for AMP, when I initially wrote AMP I ignored stuff not usually found in most real-life models. It is on list of todos here: #8296

* reuse checked types * analogous subgraph * brr go fast * clean up src logs * clean up PR more * more clean up * more documenetation * clean up * formatting * rename fast --> local * more ocmments * jostle ci * type inference * change comment for SameTypedSubgraphExtractor * get_analogous_expression -> GetAnalogousExpression * comment in GetAnaalogousExpression * add comment * replace infer tests * jostle

AndrewZhaoLuo requested review from anijain2305, icemelon, jroesch, junrushao, jwfromm, MarisaKirisame, mbrookhart, slyubomirsky, vinx13, wweic, yzhliu, zhiics and ZihengJiang as code owners December 14, 2021 00:59

AndrewZhaoLuo changed the title ~~[WIP] Add faster type inference~~ [WIP][AMP][Pass][Typing] Add faster type inference Dec 14, 2021

AndrewZhaoLuo changed the title ~~[WIP][AMP][Pass][Typing] Add faster type inference~~ [AMP][Pass][Typing] Add faster type inference Dec 14, 2021

AndrewZhaoLuo added 11 commits December 15, 2021 11:02

reuse checked types

f6747be

analogous subgraph

5d3932f

brr go fast

3220c80

clean up src logs

7dee27b

clean up PR more

08b391a

more clean up

2021f23

more documenetation

5136b85

clean up

dbf3cf6

formatting

e9a5f55

rename fast --> local

f8c5012

more ocmments

5960c5c

AndrewZhaoLuo force-pushed the aluo/amp/reuse-checked-types branch from ac1ce9f to 5960c5c Compare December 15, 2021 19:02

mbs-octoml approved these changes Dec 17, 2021

View reviewed changes

AndrewZhaoLuo added 5 commits December 16, 2021 16:10

jostle ci

f294f63

type inference

4f0b03b

change comment for SameTypedSubgraphExtractor

8301057

get_analogous_expression -> GetAnalogousExpression

1cb38f1

comment in GetAnaalogousExpression

5aae167

AndrewZhaoLuo added 2 commits December 17, 2021 11:31

add comment

d6f73f2

replace infer tests

09fbbe0

AndrewZhaoLuo requested review from areusch, comaniac, merrymercy and tqchen as code owners December 17, 2021 22:59

jostle

faeed08

mbs-octoml approved these changes Jan 3, 2022

View reviewed changes

masahi approved these changes Jan 4, 2022

View reviewed changes

masahi merged commit 9cc1df6 into apache:main Jan 4, 2022

AndrewZhaoLuo mentioned this pull request Jan 7, 2022

[Pass] DynamicToStatic uses InferTypeLocal #9869

Merged

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMP][Pass][Typing] Add faster type inference #9735

[AMP][Pass][Typing] Add faster type inference #9735

AndrewZhaoLuo commented Dec 14, 2021 •

edited

Loading

AndrewZhaoLuo commented Dec 15, 2021

AndrewZhaoLuo commented Dec 15, 2021

mbs-octoml left a comment

mbs-octoml Dec 16, 2021

AndrewZhaoLuo Dec 17, 2021

mbs-octoml Dec 16, 2021

AndrewZhaoLuo Dec 17, 2021 •

edited

Loading

AndrewZhaoLuo Dec 17, 2021

mbs-octoml Jan 3, 2022

mbs-octoml Dec 16, 2021

mbs-octoml Dec 17, 2021

AndrewZhaoLuo Dec 17, 2021

mbs-octoml Dec 16, 2021

AndrewZhaoLuo Dec 17, 2021

mbs-octoml Dec 17, 2021

AndrewZhaoLuo Dec 17, 2021

AndrewZhaoLuo commented Dec 17, 2021 •

edited

Loading

AndrewZhaoLuo commented Dec 17, 2021

FrozenGene commented Jan 4, 2022 •

edited

Loading

AndrewZhaoLuo commented Jan 7, 2022 •

edited

Loading

FrozenGene commented Jan 8, 2022

AndrewZhaoLuo commented Jan 10, 2022

[AMP][Pass][Typing] Add faster type inference #9735

[AMP][Pass][Typing] Add faster type inference #9735

Conversation

AndrewZhaoLuo commented Dec 14, 2021 • edited Loading

AndrewZhaoLuo commented Dec 15, 2021

AndrewZhaoLuo commented Dec 15, 2021

mbs-octoml left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewZhaoLuo Dec 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewZhaoLuo commented Dec 17, 2021 • edited Loading

AndrewZhaoLuo commented Dec 17, 2021

FrozenGene commented Jan 4, 2022 • edited Loading

AndrewZhaoLuo commented Jan 7, 2022 • edited Loading

FrozenGene commented Jan 8, 2022

AndrewZhaoLuo commented Jan 10, 2022

AndrewZhaoLuo commented Dec 14, 2021 •

edited

Loading

AndrewZhaoLuo Dec 17, 2021 •

edited

Loading

AndrewZhaoLuo commented Dec 17, 2021 •

edited

Loading

FrozenGene commented Jan 4, 2022 •

edited

Loading

AndrewZhaoLuo commented Jan 7, 2022 •

edited

Loading