[BYOC][TENSOORT] Add support for FP16 on TensorRT BYOC flow #10388

mikepapadim · 2022-02-25T14:03:50Z

This PR enables support for FP16 types on the TensorRT BYOC flow.

Changes:

Replaces hardcoded fp32 with infer typing from params in op converters for tensors and weights
Adds tests for fp16 for TRT

comaniac

IIUC, in addition to support FP16 for TRT, this PR attempts to deprecate the support of TRT < 7.0.0? Since we don't have TRT runtime in CI, I have no clue how it affects existing use cases. If so, this would be a more important change and needs to be discussed and documented.

comaniac · 2022-02-25T22:26:56Z

src/runtime/contrib/tensorrt/tensorrt_builder.cc

@@ -150,19 +165,30 @@ void TensorRTBuilder::AddLayer(int nid, const JSONGraphNode& node) {
  // Get outputs.
  node_output_map_[nid] = {};
  for (auto out : params.outputs) {
+    VLOG(1) << "Before forcing output tensor type: " << static_cast<int>(out->getType())
+            << std::endl;


No need a newline for log.

comaniac · 2022-02-25T22:27:35Z

src/runtime/contrib/tensorrt/tensorrt_builder.cc

+  // According to documentation this is required for single FP precision. Always on doesnt seem to
+  // prevent pure FP32 execution


nit: Better to provide the document link.

comaniac · 2022-02-25T22:27:42Z

src/runtime/contrib/tensorrt/tensorrt_builder.cc

    node_output_map_[nid].push_back(TensorRTOpInput(out));
+    VLOG(1) << "After forcing output tensor type: " << static_cast<int>(out->getType())
+            << std::endl;


comaniac · 2022-02-25T22:27:48Z

src/runtime/contrib/tensorrt/tensorrt_builder.cc

+  // Pass it explicitly
+  // config_->setFlag(nvinfer1::BuilderFlag::kDEBUG);


comaniac · 2022-02-25T22:29:05Z

src/runtime/contrib/tensorrt/tensorrt_builder.cc

@@ -204,19 +227,30 @@ TensorRTEngineAndContext TensorRTBuilder::BuildEngine() {

 nvinfer1::Weights TensorRTBuilder::GetDLTensorAsWeights(const DLTensor* dptr,
                                                        DLDeviceType src_device) {
+  VLOG(1) << "Device type for DLTensorAsWeight: " << dptr->device.device_type;
+  VLOG(1) << "DLType for DLTensorAsWeight: " << dptr->dtype;
+  VLOG(1) << "DLShape for DLTensorAsWeight: " << dptr->shape << std::endl;


comaniac · 2022-02-25T22:35:59Z

tests/python/contrib/test_tensorrt.py

@@ -169,50 +189,54 @@ def compile_and_run(mod, params, i_data, mode="vm", use_trt=True):
                mod, params, i_data, mode=mode, use_trt=use_trt
            )

+    print(result_dict)


comaniac · 2022-02-25T22:36:09Z

tests/python/contrib/test_tensorrt.py


-    if run_module:
-        assert_result_dict_holds(result_dict)
+        print(result_dict)


comaniac · 2022-02-25T22:36:32Z

tests/python/contrib/test_tensorrt.py

+    # run_and_verify_func(
+    #     get_graph((1, 3, 16, 16), (1, 3, 1, 1), channels=1), run_module=run_module)


comaniac · 2022-02-25T22:36:53Z

tests/python/contrib/test_tensorrt.py

@@ -471,7 +502,8 @@ def get_graph(
        f = relay.Function([x], out)
        return f, {"x": x_shape}, []

-    run_and_verify_func(get_graph(), run_module=run_module)
+    # for tp in ["float32", "float16", "int8", "uint8"]:


comaniac · 2022-02-25T22:37:28Z

tests/python/contrib/test_tensorrt.py

+    # run_and_verify_func(get_graph((1, 1000), axis=-1), run_module=run_module)
+    # run_and_verify_func(get_graph((1, 3, 4), axis=-2), run_module=run_module)
+    # run_and_verify_func(get_graph((1, 3, 4), axis=1), run_module=run_module)


mikepapadim · 2022-03-02T11:22:19Z

IIUC, in addition to support FP16 for TRT, this PR attempts to deprecate the support of TRT < 7.0.0? Since we don't have TRT runtime in CI, I have no clue how it affects existing use cases. If so, this would be a more important change and needs to be discussed and documented.

I revert all of the versioning changes and just kept it focused on the fp16 support. Thanks for the review PTAL.

mikepapadim · 2022-03-02T11:23:03Z

@masahi

comaniac

LGTM. Thanks. Just nits.

src/runtime/contrib/tensorrt/tensorrt_builder.cc

…e config

mbs-octoml · 2022-03-07T19:14:18Z

src/runtime/contrib/tensorrt/tensorrt_builder.cc

-    ICHECK(TypeMatch(dtypes[i], kDLFloat, 32)) << "Only FP32 inputs are supported.";
-    auto input_tensor = network_->addInput(name.c_str(), nvinfer1::DataType::kFLOAT, dims);
+    auto tensor_dtype =
+        (dtypes[i].bits == 16) ? nvinfer1::DataType::kHALF : nvinfer1::DataType::kFLOAT;


I'd suggest ICHECK failing if unsupported type.

mbs-octoml · 2022-03-07T19:16:27Z

python/tvm/relay/op/contrib/tensorrt.py

@@ -202,9 +211,6 @@ def _func_wrapper(expr):
        # ops with dynamic shapes are offloaded to VM
        if check_dynamism(args, op_name):
            return False
-        if any([x.checked_type.dtype != "float32" for x in args]):


I'm not seeing where the type check (which must now be generalized to float32/float16) has gone too. If we remove it altogether then I think we'll either generate bad code or fail at trt build time, which from the tvm users point of view is runtime and too late. We also need to check in the predicate to prevent collage from exploring invalid candidate kernels.

mbs-octoml · 2022-03-07T19:17:38Z

src/runtime/contrib/tensorrt/tensorrt_builder.cc


  // Convert op to TRT.
  converter->Convert(&params);

  // Get outputs.
  node_output_map_[nid] = {};
  for (auto out : params.outputs) {
+    auto out_type = params.inputs.at(1).weight.type == params.inputs.at(0).tensor->getType()


Can you explain this? It seems very specific yet AddLayer is used for all of the supported ops.

This is unfortunately causing an vector index exception for me. I believe we need to pick up the output type from the node's dtype vector.

mbs-octoml · 2022-03-07T19:18:06Z

src/runtime/contrib/tensorrt/tensorrt_builder.cc

-                             ? nvinfer1::DataType::kFLOAT
-                             : nvinfer1::DataType::kINT32;
+
+  const auto trt_dtype = (static_cast<int>(dptr->dtype.bits) == 16) ? nvinfer1::DataType::kHALF


Another ICHECK would be in order to make sure we're not silently generating bad code.

mbs-octoml · 2022-03-07T19:18:25Z

src/runtime/contrib/tensorrt/tensorrt_builder.cc

@@ -250,7 +253,7 @@ void TensorRTBuilder::CleanUp() {
 #endif
  builder_->destroy();
  for (auto weight : trt_weights_) {
-    if (weight.type == nvinfer1::DataType::kFLOAT) {
+    if (static_cast<int>(weight.type) <= 1) {


Can we avoid hard coding the enum constants?

…tion

mbs-octoml · 2022-03-11T23:56:33Z

src/runtime/contrib/tensorrt/tensorrt_builder.cc

@@ -85,8 +85,13 @@ void TensorRTBuilder::AddInput(int nid, uint32_t entry_id, const JSONGraphNode&
      shape.erase(shape.begin());
    }
    nvinfer1::Dims dims = VectorToTrtDims(shape);
-    ICHECK(TypeMatch(dtypes[i], kDLFloat, 32)) << "Only FP32 inputs are supported.";
-    auto input_tensor = network_->addInput(name.c_str(), nvinfer1::DataType::kFLOAT, dims);
+    ICHECK((dtypes[i].bits != 16 || dtypes[i].bits != 32))


This is always true, I think you mean bits == 16 || bits == 32.

mbs-octoml · 2022-03-12T00:47:20Z

python/tvm/relay/op/contrib/tensorrt.py

+    ret: bool
+        True if supported, False if not.
+    """
+    if any([x.checked_type.dtype in supported_types for x in args]):


if all(...)
return True
log error
return False

…0388) * FP16 support for TRT * Cleanups on tests * Fix for typing on output tensor * Fix icheck * Add TRT inference builder auto-convert precision flags as attrs in the config * Address PR comments * Fix bug on passing the new config attrs to codegen for tensorrt partition Co-authored-by: Michalis Papapdimitriou <mpapapdimitriou@octoml.ai>

mikepapadim requested review from jroesch, slyubomirsky, icemelon, MarisaKirisame, ZihengJiang, yzhliu, vinx13, mbrookhart, jwfromm, zhiics, anijain2305, wweic, junrushao, tqchen, liangfu, areusch, tmoreau89, masahi, kazum, comaniac and merrymercy as code owners February 25, 2022 14:03

masahi self-assigned this Feb 25, 2022

comaniac requested changes Feb 25, 2022

View reviewed changes

FP16 support for TRT

e36ceb0

mikepapadim force-pushed the tensorrt_fp16 branch from 81c53f4 to e36ceb0 Compare March 1, 2022 13:31

Michalis Papapdimitriou added 2 commits March 1, 2022 09:12

Cleanups on tests

06d8a2c

Fix for typing on output tensor

2c19d92

Fix icheck

d357c32

comaniac approved these changes Mar 2, 2022

View reviewed changes

src/runtime/contrib/tensorrt/tensorrt_builder.cc Outdated Show resolved Hide resolved

mikepapadim requested review from trevor-m, mbaret and manupak as code owners March 3, 2022 09:42

Add TRT inference builder auto-convert precision flags as attrs in th…

5bdd0ed

…e config

mikepapadim force-pushed the tensorrt_fp16 branch from 6a6640e to 5bdd0ed Compare March 4, 2022 13:19

mbs-octoml suggested changes Mar 7, 2022

View reviewed changes

mikepapadim force-pushed the tensorrt_fp16 branch 3 times, most recently from e7405a9 to 2eb104b Compare March 9, 2022 16:14

Address PR comments

422ae09

mikepapadim force-pushed the tensorrt_fp16 branch 4 times, most recently from 3a3e1e4 to 0741642 Compare March 10, 2022 16:16

Fix bug on passing the new config attrs to codegen for tensorrt parti…

d0e508b

…tion

mikepapadim force-pushed the tensorrt_fp16 branch from 0741642 to d0e508b Compare March 10, 2022 17:59

masahi merged commit 4e4f607 into apache:main Mar 11, 2022

mbs-octoml reviewed Mar 11, 2022

View reviewed changes

mbs-octoml reviewed Mar 12, 2022

View reviewed changes

mikepapadim mentioned this pull request Mar 14, 2022

[Bug][TRT] Wrong check on dtypes #10600

Merged

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BYOC][TENSOORT] Add support for FP16 on TensorRT BYOC flow #10388

[BYOC][TENSOORT] Add support for FP16 on TensorRT BYOC flow #10388

mikepapadim commented Feb 25, 2022 •

edited

Loading

comaniac left a comment

comaniac Feb 25, 2022

comaniac Feb 25, 2022

comaniac Feb 25, 2022

comaniac Feb 25, 2022

comaniac Feb 25, 2022

comaniac Feb 25, 2022

comaniac Feb 25, 2022

comaniac Feb 25, 2022

comaniac Feb 25, 2022

comaniac Feb 25, 2022

mikepapadim commented Mar 2, 2022

mikepapadim commented Mar 2, 2022

comaniac left a comment

mbs-octoml Mar 7, 2022

mbs-octoml Mar 7, 2022

mbs-octoml Mar 7, 2022

mbs-octoml Mar 11, 2022 •

edited

Loading

mbs-octoml Mar 7, 2022

mbs-octoml Mar 7, 2022

mbs-octoml Mar 11, 2022

mbs-octoml Mar 12, 2022

		// According to documentation this is required for single FP precision. Always on doesnt seem to
		// prevent pure FP32 execution

		// Pass it explicitly
		// config_->setFlag(nvinfer1::BuilderFlag::kDEBUG);

		# run_and_verify_func(
		# get_graph((1, 3, 16, 16), (1, 3, 1, 1), channels=1), run_module=run_module)

[BYOC][TENSOORT] Add support for FP16 on TensorRT BYOC flow #10388

[BYOC][TENSOORT] Add support for FP16 on TensorRT BYOC flow #10388

Conversation

mikepapadim commented Feb 25, 2022 • edited Loading

comaniac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikepapadim commented Mar 2, 2022

mikepapadim commented Mar 2, 2022

comaniac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbs-octoml Mar 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikepapadim commented Feb 25, 2022 •

edited

Loading

mbs-octoml Mar 11, 2022 •

edited

Loading