-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorRT compatible retinanet #4395
Comments
@julienripoche Thanks for the proposal. The preprocessing and post-processing steps are quite important and we can't bypass them in the general case. Nevertheless since the proposed modification happens only in the |
Thanks @julienripoche, I'll have a look. Note that it might take a while to fully investigate its effects as we will need to a) ensure that all existing pre-trained models maintain the same level of accuracy, b) no internal FB code breaks and c) there are no other unintended consequences. |
Hello @julienripoche, would you mind explaining how you bypassed these steps, if you remember? I'm very interested into knowing how you did it as I'm trying to achieve the same goal 😄 With these bypasses, where you able to achieve some interesting performance gains? |
Hi @aurelien-m, of course ;) Basically what I did is replacing some part of the code by the identity. import torch
# Load retinanet
pth_path = "/path/to/retinanet.pth"
retinanet = torch.load(pth_path, map_location="cpu")
retinanet.eval()
# Image sizes
original_image_size = (677, 511)
# Normalize hack
normalize_tmp = retinanet.transform.normalize
retinanet_normalize = lambda x: normalize_tmp(x)
retinanet.transform.normalize = lambda x: x
# Resize hack
resize_tmp = retinanet.transform.resize
retinanet_resize = lambda x: resize_tmp(x, None)[0]
retinanet.transform.resize = lambda x, y: (x, y)
# Batch images hack
# /!\ torchvision version dependent ???
# retinanet.transform.batch_images = lambda x, size_divisible: x[0].unsqueeze(0)
retinanet.transform.batch_images = lambda x: x[0].unsqueeze(0)
# Generate dummy input
def preprocess_image(img):
result = retinanet_resize(retinanet_normalize(img)[0]).unsqueeze(0)
return result
dummy_input = torch.randn(1, 3, original_image_size[0], original_image_size[1])
dummy_input = preprocess_image(dummy_input)
image_size = tuple(dummy_input.shape[2:])
print(dummy_input.shape)
# Postprocess detections hack
postprocess_detections_tmp = retinanet.postprocess_detections
retinanet_postprocess_detections = lambda x: postprocess_detections_tmp(x["split_head_outputs"], x["split_anchors"], [image_size])
retinanet.postprocess_detections = lambda x, y, z: {"split_head_outputs": x, "split_anchors": y}
# Postprocess hack
postprocess_tmp = retinanet.transform.postprocess
retinanet_postprocess = lambda x: postprocess_tmp(x, [image_size], [original_image_size])
retinanet.transform.postprocess = lambda x, y, z: x
# ONNX export
onnx_path = "/path/to/retinanet.onnx"
torch.onnx.export(
retinanet,
dummy_input,
onnx_path,
verbose=False,
opset_version=11,
input_names = ["images"],
) The resulting ONNX should almost only contain the network itself, plus some anchor treatment. That said, maybe a simpler way to achieve this would have been to simply replace the forward method by a "simpler" one. About performance gain, I don't remember exactly. Hope it helps :) |
@julienripoche Thanks man, it really helped me out! However, if I understand correctly, this only works for a batch of size 1? Do you know if I can make it work for a higher batch size? I was able to convert it to TensorRT and do some inference, but I'm getting slower inferences than the original PyTorch model. |
Nevermind, I found out how to infer a batch of multiple images. I used: retinanet.transform.batch_images = lambda x, size_divisible: torch.stack(x) It seems to work from what I can see, I'll do some more testing and look at the performance that I get with a higher batch size. Thanks again! |
Is there any info on the improvement of the inference time/latency with exported TensorRT in comparison with ONNX or PyTorch? |
On my side, I was able to achieve a x2 to x3 in speeds depending on the hardware, from Pytorch to TensorRT (I don't have the exact numbers anymore, sorry!) |
Hi, i use your code to convert pytorch retinanet to tensorrt , it was successful ! , but i got a problem that the output is not retina format, i need u help :( |
🚀 The feature
The possibility to compile ONNX exported retinanet model with tensorRT.
Motivation, pitch
I'm working with the torchvision retinanet implementation and have some production constraints regarding inference time. I think it would be great if the ONNX export of retinanet could be further compiled in tensorRT.
Alternatives
No response
Additional context
Actually, I already managed to make it work.
I exported the retinanet model to onnx with
opset_version=11
, then compiled it intensorRT 8.0.1
.To do that I bypassed two preprocessing steps in the
GeneralizedRCNNTransform
call:I also replaced the type of the two
torch.arange
, in shifts_x and shifts_y of theAnchorGenerator
call, fromtorch.float32
totorch.int32
, as the current version of tensorRT does not support this.And finally I bypassed the postprocessing operation of
RetinaNet
:Im my case it is fine to make preprocessing and postprocessing outside of the
RetinaNet
call.So my request is actually only on the
AnchorGenerator
, i.e. changing the type of thetorch.arange
operations fromtorch.float32
totorch.int32
.cc @datumbox
The text was updated successfully, but these errors were encountered: