[RFC][Tracking Issue][AMP] Tracking Issue for Mixed Precision Pass #8296

AndrewZhaoLuo · 2021-06-21T17:21:28Z

AndrewZhaoLuo · 2021-07-13T17:35:51Z

AndrewZhaoLuo · 2021-07-28T17:04:06Z

masahi · 2021-08-04T05:40:05Z

I've hit a nasty issue. On CPU targets, our sorting related ops are implemented in C++ https://github.com/apache/tvm/blob/main/src/runtime/contrib/sort/sort.cc#L436, and they don't support fp16. So ops like topk, argsort, nms etc do not work on fp16 + cpu target combination. We can add all of them to the NEVER list, but then that would introduce unnecessary cast for GPU targets because sorting on GPU is implemented in TIR so it doesn't have issues with fp16.

Maybe we need to add a specialized CPU sort for fp16 or rewrite CPU sort in TIR... (the same issue would come up with int4, bfloat16 etc). The former solution would not be hard since we just need to add a specialized comparison functor for fp16 like https://github.com/apache/tvm/blob/main/src/runtime/contrib/sort/sort.cc#L40-L43

masahi · 2021-08-24T11:48:06Z

It looks like transformer like models have many softmax ops that introduce a lot of casting before / after them, like https://gist.github.com/masahi/0d7d96ae88722b616a906cec2054559e#file-transformer-txt-L137-L143

The fact that softmax and the following cast to fp16 are not fused surprised me. This is because the op pattern for softmax is kOpaque,

tvm/python/tvm/relay/op/nn/_nn.py

Line 42 in 66ac470

reg.register_pattern("nn.softmax", OpPattern.OPAQUE)

. The cast overheads are big if they are not fused, so we are leaving a lot of perf on the table.

@yzhliu Is there a reason softmax op pattern cannot be OUT_ELEMWISE_FUSABLE?

masahi · 2021-09-03T09:23:32Z

@AndrewZhaoLuo What is our goal wrt mixed_type accumulation? Assuming we do find cases where mixed accum is beneficial, how are we going to decide when to enable / disable it? Given that currently we can only choose one or the other per op basis:

tvm/python/tvm/relay/transform/mixed_precision.py

Lines 167 to 168 in f4f525d

    
           # return ["float32", mixed_precision_type] 
        
           return [mixed_precision_type, mixed_precision_type]

AndrewZhaoLuo · 2021-09-06T08:10:05Z

Yeah the issue behind creating defaults is that we cannot create defaults that work best for every situation. This is especially true since whenever we want speed we trade accuracy which can sometimes become a problem.

For the defaults I envision that for most ops we don't accumulate to FP32. For some ops like the global pools and sums we might turn it on. Really the best way to determine the criteria is to do a lot of the work you've been doing in trying out different models in different applications and seeing what needs to be turned on and off.

That being said, this is really designed to be a tool which requires the user sometimes to go back and modify the default values provided to either get more speed if their model can afford it, or accuracy if they need it. It requires investigation and I don't think we can probably hit all cases well. A tutorial here would help (which is on my long list of TODOs).

Finally, while things are done on a per-op basis, the actual mixed precision function can look at some parts of the relay call like the attributes or the node or the input tensor sizes. Therefore we can be smart about the quantization (e.g. for global pooling, only accumulate in fp32 if the input to output reduction is large enough). Again, a tutorial or example would help flesh this out.

masahi · 2021-09-24T13:00:25Z

@AndrewZhaoLuo I briefly looked at bfloat16. While fp16 vs bf16 makes no difference for the conversion pass, it seems it is going to take a lot of effort to compile and run a bf16 model end to end, for at least two reasons:

The constant folding pass doesn't work on bfloat16 input
Numpy doesn't understand bfloat16, but some topi schedules (winograd conv) try to create a numpy array of type out_dype, which in this case bfloat16.

Since tensorcore can natively run bf16 workloads at the same rate as fp16, and bf16 on x86 servers is becoming a thing, it would be nice to have a good support for bf16 across the stack in the future.

This was referenced Jun 21, 2021

[RFC] [Relay] Automatic Mixed Precision Pass apache/tvm-rfcs#6

Merged

[Relay] [Pass] Add mixed precision (e.g. FP16) model conversion pass #8069

Merged

AndrewZhaoLuo changed the title ~~Tracking Issue for Mixed Precision Pass~~ [AMP] Tracking Issue for Mixed Precision Pass Jun 25, 2021

comaniac changed the title ~~[AMP] Tracking Issue for Mixed Precision Pass~~ [RFC][Tracking Issue][AMP] Tracking Issue for Mixed Precision Pass Jul 27, 2021

comaniac added the type:rfc-tracking RFC progress tracking. Ref: https://github.com/apache/tvm-rfcs label Jul 27, 2021

masahi mentioned this issue Aug 6, 2021

[Contrib] Support fp16 input in cpu sort #8672

Merged

AndrewZhaoLuo mentioned this issue Jan 10, 2022

[AMP][Pass][Typing] Add faster type inference #9735

Merged

areusch mentioned this issue Apr 8, 2022

Add pass to fold type transformations into function signature #9357

Closed

areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022

Lunderberg added topi python/tvm/topi relay:op src/relay/op and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 19, 2022

tqchen closed this as completed Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC][Tracking Issue][AMP] Tracking Issue for Mixed Precision Pass #8296

[RFC][Tracking Issue][AMP] Tracking Issue for Mixed Precision Pass #8296

AndrewZhaoLuo commented Jun 21, 2021 •

edited

Loading

AndrewZhaoLuo commented Jul 13, 2021

AndrewZhaoLuo commented Jul 28, 2021

masahi commented Aug 4, 2021 •

edited

Loading

masahi commented Aug 24, 2021

masahi commented Sep 3, 2021 •

edited

Loading

AndrewZhaoLuo commented Sep 6, 2021

masahi commented Sep 24, 2021 •

edited

Loading

[RFC][Tracking Issue][AMP] Tracking Issue for Mixed Precision Pass #8296

[RFC][Tracking Issue][AMP] Tracking Issue for Mixed Precision Pass #8296

Comments

AndrewZhaoLuo commented Jun 21, 2021 • edited Loading

AndrewZhaoLuo commented Jul 13, 2021

AndrewZhaoLuo commented Jul 28, 2021

masahi commented Aug 4, 2021 • edited Loading

masahi commented Aug 24, 2021

masahi commented Sep 3, 2021 • edited Loading

AndrewZhaoLuo commented Sep 6, 2021

masahi commented Sep 24, 2021 • edited Loading

AndrewZhaoLuo commented Jun 21, 2021 •

edited

Loading

masahi commented Aug 4, 2021 •

edited

Loading

masahi commented Sep 3, 2021 •

edited

Loading

masahi commented Sep 24, 2021 •

edited

Loading