TensorRT compatible retinanet #4395

julienripoche · 2021-09-13T09:20:56Z

🚀 The feature

The possibility to compile ONNX exported retinanet model with tensorRT.

Motivation, pitch

I'm working with the torchvision retinanet implementation and have some production constraints regarding inference time. I think it would be great if the ONNX export of retinanet could be further compiled in tensorRT.

Alternatives

No response

Additional context

Actually, I already managed to make it work.
I exported the retinanet model to onnx with opset_version=11, then compiled it in tensorRT 8.0.1.
To do that I bypassed two preprocessing steps in the GeneralizedRCNNTransform call:

resize, as it contains a Floor operator not compatible with tensorRT

[09/08/2021-13:14:04] [E] [TRT] ModelImporter.cpp:725: ERROR: ModelImporter.cpp:179 In function parseGraph:
[6] Invalid Node - Resize_43
[graph.cpp::computeInputExecutionUses::519] Error Code 9: Internal Error (Floor_30: IUnaryLayer cannot be used to compute a shape tensor)

batch_images, as it contains a Pad operator not compatible with tensorRT

[09/08/2021-13:12:27] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:2984 In function importPad:
[8] Assertion failed: inputs.at(1).is_weights() && "The input pads is required to be an initializer."

I also replaced the type of the two torch.arange, in shifts_x and shifts_y of the AnchorGenerator call, from torch.float32 to torch.int32, as the current version of tensorRT does not support this.

[09/08/2021-14:58:35] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:3170 In function importRange:
[8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!"

And finally I bypassed the postprocessing operation of RetinaNet:

postprocess_detections, as it contains a where operation not compatible with tensorRT

[09/08/2021-15:14:12] [I] [TRT] No importer registered for op: NonZero. Attempting to import as plugin.
[09/08/2021-15:14:12] [I] [TRT] Searching for plugin: NonZero, plugin_version: 1, plugin_namespace: 
[09/08/2021-15:14:12] [E] [TRT] 3: getPluginCreator could not find plugin: NonZero version: 1
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:720: While parsing node number 729 [NonZero -> "2086"]:
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:721: --- Begin node ---
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:722: input: "2085"
output: "2086"
name: "NonZero_729"
op_type: "NonZero"

[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:723: --- End node ---
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:4643 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

Im my case it is fine to make preprocessing and postprocessing outside of the RetinaNet call.
So my request is actually only on the AnchorGenerator, i.e. changing the type of the torch.arange operations from torch.float32 to torch.int32.

cc @datumbox

The text was updated successfully, but these errors were encountered:

datumbox · 2021-09-14T08:31:49Z

@julienripoche Thanks for the proposal.

The preprocessing and post-processing steps are quite important and we can't bypass them in the general case. Nevertheless since the proposed modification happens only in the AnchorGenerator and only internally, it might be possible, provided it does not have an other side-effects that will affect the accuracy or behaviour of the model. In other words, if the change from float32 to int32 does not break any tests and does not affect the accuracy of pre-trained models then I'm happy to review and discuss a PR that investigates changing the behaviour.

…_utils.py (pytorch#4395)

julienripoche · 2021-09-14T12:16:45Z

@datumbox Thanks for your consideration :)

I made the little modifications on anchor_utils.py and submitted the PR.

datumbox · 2021-09-14T12:29:26Z

Thanks @julienripoche, I'll have a look.

Note that it might take a while to fully investigate its effects as we will need to a) ensure that all existing pre-trained models maintain the same level of accuracy, b) no internal FB code breaks and c) there are no other unintended consequences.

…_utils.py (#4395) (#4409) Co-authored-by: Julien RIPOCHE <ripoche@magic-lemp.com> Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>

…in anchor_utils.py (#4395) (#4409) Summary: Reviewed By: datumbox Differential Revision: D31268024 fbshipit-source-id: 0294ad05fc94bdf5a6d3eba50d85813d568e8fbe Co-authored-by: Julien RIPOCHE <ripoche@magic-lemp.com> Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>

montmejat · 2022-04-01T14:23:15Z

rocket The feature

The possibility to compile ONNX exported retinanet model with tensorRT.

Motivation, pitch

I'm working with the torchvision retinanet implementation and have some production constraints regarding inference time. I think it would be great if the ONNX export of retinanet could be further compiled in tensorRT.

Alternatives

No response

Additional context

Actually, I already managed to make it work. I exported the retinanet model to onnx with opset_version=11, then compiled it in tensorRT 8.0.1. To do that I bypassed two preprocessing steps in the GeneralizedRCNNTransform call:
* [resize](https://github.com/pytorch/vision/blob/c359d8d56242997e6209b71524d7a6199ea333b2/torchvision/models/detection/transform.py#L112), as it contains a Floor operator not compatible with tensorRT
[09/08/2021-13:14:04] [E] [TRT] ModelImporter.cpp:725: ERROR: ModelImporter.cpp:179 In function parseGraph:
[6] Invalid Node - Resize_43
[graph.cpp::computeInputExecutionUses::519] Error Code 9: Internal Error (Floor_30: IUnaryLayer cannot be used to compute a shape tensor)
* [batch_images](https://github.com/pytorch/vision/blob/c359d8d56242997e6209b71524d7a6199ea333b2/torchvision/models/detection/transform.py#L118), as it contains a Pad operator not compatible with tensorRT
[09/08/2021-13:12:27] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:2984 In function importPad:
[8] Assertion failed: inputs.at(1).is_weights() && "The input pads is required to be an initializer."
I also replaced the type of the two torch.arange, in shifts_x and shifts_y of the AnchorGenerator call, from torch.float32 to torch.int32, as the current version of tensorRT does not support this.
[09/08/2021-14:58:35] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:3170 In function importRange:
[8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!"
And finally I bypassed the postprocessing operation of RetinaNet:
* [postprocess_detections](https://github.com/pytorch/vision/blob/c359d8d56242997e6209b71524d7a6199ea333b2/torchvision/models/detection/retinanet.py#L550), as it contains a where operation not compatible with tensorRT
[09/08/2021-15:14:12] [I] [TRT] No importer registered for op: NonZero. Attempting to import as plugin.
[09/08/2021-15:14:12] [I] [TRT] Searching for plugin: NonZero, plugin_version: 1, plugin_namespace: 
[09/08/2021-15:14:12] [E] [TRT] 3: getPluginCreator could not find plugin: NonZero version: 1
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:720: While parsing node number 729 [NonZero -> "2086"]:
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:721: --- Begin node ---
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:722: input: "2085"
output: "2086"
name: "NonZero_729"
op_type: "NonZero"

[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:723: --- End node ---
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:4643 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
Im my case it is fine to make preprocessing and postprocessing outside of the RetinaNet call. So my request is actually only on the AnchorGenerator, i.e. changing the type of the torch.arange operations from torch.float32 to torch.int32.

cc @datumbox

Hello @julienripoche, would you mind explaining how you bypassed these steps, if you remember? I'm very interested into knowing how you did it as I'm trying to achieve the same goal 😄 With these bypasses, where you able to achieve some interesting performance gains?

julienripoche · 2022-04-02T15:06:29Z

Hi @aurelien-m, of course ;)

Basically what I did is replacing some part of the code by the identity.
Here is the code that I used to achieve that.

import torch

# Load retinanet
pth_path = "/path/to/retinanet.pth"
retinanet = torch.load(pth_path, map_location="cpu")
retinanet.eval()

# Image sizes
original_image_size = (677, 511)

# Normalize hack
normalize_tmp = retinanet.transform.normalize
retinanet_normalize = lambda x: normalize_tmp(x)
retinanet.transform.normalize = lambda x: x

# Resize hack
resize_tmp = retinanet.transform.resize
retinanet_resize = lambda x: resize_tmp(x, None)[0]
retinanet.transform.resize = lambda x, y: (x, y)

# Batch images hack
# /!\ torchvision version dependent ???
# retinanet.transform.batch_images = lambda x, size_divisible: x[0].unsqueeze(0)
retinanet.transform.batch_images = lambda x: x[0].unsqueeze(0)

# Generate dummy input
def preprocess_image(img):
    result = retinanet_resize(retinanet_normalize(img)[0]).unsqueeze(0)
    return result
dummy_input = torch.randn(1, 3, original_image_size[0], original_image_size[1])
dummy_input = preprocess_image(dummy_input)
image_size = tuple(dummy_input.shape[2:])
print(dummy_input.shape)

# Postprocess detections hack
postprocess_detections_tmp = retinanet.postprocess_detections
retinanet_postprocess_detections = lambda x: postprocess_detections_tmp(x["split_head_outputs"], x["split_anchors"], [image_size])
retinanet.postprocess_detections = lambda x, y, z: {"split_head_outputs": x, "split_anchors": y}

# Postprocess hack
postprocess_tmp = retinanet.transform.postprocess
retinanet_postprocess = lambda x: postprocess_tmp(x, [image_size], [original_image_size])
retinanet.transform.postprocess = lambda x, y, z: x

# ONNX export
onnx_path = "/path/to/retinanet.onnx"
torch.onnx.export(
    retinanet,
    dummy_input,
    onnx_path,
    verbose=False,
    opset_version=11,
    input_names = ["images"],
)

The resulting ONNX should almost only contain the network itself, plus some anchor treatment.
This ONNX should be compilable by tensorRT.

That said, maybe a simpler way to achieve this would have been to simply replace the forward method by a "simpler" one.

About performance gain, I don't remember exactly.
Running some old comparison I can tell you that compiling the model with float16 and adding preprocess and postprocess, the model is around 2 times faster than the original model, i.e. without bypass, exported in ONNX.

Hope it helps :)

montmejat · 2022-04-04T15:55:08Z

@julienripoche Thanks man, it really helped me out!

However, if I understand correctly, this only works for a batch of size 1? Do you know if I can make it work for a higher batch size? I was able to convert it to TensorRT and do some inference, but I'm getting slower inferences than the original PyTorch model.

montmejat · 2022-04-04T17:09:31Z

Nevermind, I found out how to infer a batch of multiple images. I used:

retinanet.transform.batch_images = lambda x, size_divisible: torch.stack(x)

It seems to work from what I can see, I'll do some more testing and look at the performance that I get with a higher batch size. Thanks again!

Michelvl92 · 2022-04-22T09:51:22Z

Is there any info on the improvement of the inference time/latency with exported TensorRT in comparison with ONNX or PyTorch?

montmejat · 2022-04-26T21:20:26Z

Is there any info on the improvement of the inference time/latency with exported TensorRT in comparison with ONNX or PyTorch?

On my side, I was able to achieve a x2 to x3 in speeds depending on the hardware, from Pytorch to TensorRT (I don't have the exact numbers anymore, sorry!)

ChaosAIVision · 2024-05-31T18:05:03Z

Hi, i use your code to convert pytorch retinanet to tensorrt , it was successful ! , but i got a problem that the output is not retina format, i need u help :(
Inference output: [-22.813814 -20.028933 -23.652468 ... -51.884644 -52.954933 -55.01 ]

datumbox added topic: object detection module: models labels Sep 14, 2021

julienripoche pushed a commit to julienripoche/vision that referenced this issue Sep 14, 2021

Change torch.arange dtype from torch.float32 to torch.int32 in anchor…

e68c2a2

…_utils.py (pytorch#4395)

julienripoche mentioned this issue Sep 14, 2021

TensorRT compatible retinanet #4409

Merged

datumbox closed this as completed Sep 16, 2021

montmejat mentioned this issue Apr 21, 2022

Batch size not changing NVIDIA/TensorRT#1904

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT compatible retinanet #4395

TensorRT compatible retinanet #4395

julienripoche commented Sep 13, 2021 •

edited by pytorch-probot bot

Loading

datumbox commented Sep 14, 2021 •

edited

Loading

julienripoche commented Sep 14, 2021

datumbox commented Sep 14, 2021 •

edited

Loading

montmejat commented Apr 1, 2022

rocket The feature

Motivation, pitch

Alternatives

Additional context

julienripoche commented Apr 2, 2022

montmejat commented Apr 4, 2022

montmejat commented Apr 4, 2022

Michelvl92 commented Apr 22, 2022

montmejat commented Apr 26, 2022

ChaosAIVision commented May 31, 2024

TensorRT compatible retinanet #4395

TensorRT compatible retinanet #4395

Comments

julienripoche commented Sep 13, 2021 • edited by pytorch-probot bot Loading

🚀 The feature

Motivation, pitch

Alternatives

Additional context

datumbox commented Sep 14, 2021 • edited Loading

julienripoche commented Sep 14, 2021

datumbox commented Sep 14, 2021 • edited Loading

montmejat commented Apr 1, 2022

rocket The feature

Motivation, pitch

Alternatives

Additional context

julienripoche commented Apr 2, 2022

montmejat commented Apr 4, 2022

montmejat commented Apr 4, 2022

Michelvl92 commented Apr 22, 2022

montmejat commented Apr 26, 2022

ChaosAIVision commented May 31, 2024

julienripoche commented Sep 13, 2021 •

edited by pytorch-probot bot

Loading

datumbox commented Sep 14, 2021 •

edited

Loading

datumbox commented Sep 14, 2021 •

edited

Loading