CombinedNonMaxSuppression is not supported in ONNX #1337

jan-golda · 2021-02-12T08:09:14Z

Describe the bug
I am trying to convert MaskRCNN in TensorFlow 2 to ONNX but it is failing due to CombinedNonMaxSuppression op being not supported in ONNX.

Urgency
It is blocking the use of MaskRCNN in ONNX.

System information

OS Platform and Distribution: Linux Ubuntu 20.04.1
Tensorflow Version: 2.3.1
ONNX version: 1.8.0
Python version: 3.8.5

To Reproduce
Try to convert any model that uses tf.image.combined_non_max_suppression

Expected behaviour
A model should be converted without a fail on CombinedNonMaxSuppression op.

Additional context
This was already reported in #847 for YOLO, and I have tried to apply the workaround from there - replace CombinedNonMaxSuppression with a NonMaxSuppression accompanied by a set of ops that were meant to replace the "Combined" part.

I have tried to get it to work for a few days, but in the case of MaskRCNN, this seems to be more complicated than in the case of YOLO. I had to apply the NonMaxSuppression for each class in each sample from batch separately, then pad it, select the top results for a class, then select the top results for a box, retrieve information about classes and scores, pad it again and finally gather results across the batch.

There is a reason why this op was added to TF, as recreating it from scratch is quite complicated, therefore I would like to ask if you could add the support for it in the ONNX.

Moreover, when I was running the model with partially applied changes I was observing significant performance drop when running it with automatic mixed-precision - in short: replacing the CombinedNonMaxSuppression has a noticeable impact on the original TF model, which is not ideal.

The text was updated successfully, but these errors were encountered:

TomWildenhain-Microsoft · 2021-02-18T00:03:03Z

Hi @jan-golda to convert this op, we can either 1) compose it out of ONNX ops or 2) implement it as a custom op. From your experimenting in TF is seems like you are finding making a compositions of ops to be expensive. We have previously made relatively complicated ops out of compositions of ops with decent performance, but sometimes it can't be done. In this case, I suspect we can get an efficient composition. Are you using any loops in your implementation, or are you using only batched tensor ops + Gather?

jan-golda · 2021-02-18T17:35:05Z

Hi @TomWildenhain-Microsoft
Well, I was trying to use the NonMaxSuppression, but since it does no support batching nor class-wise suppression I had to iterate over both the batch and the classes.
I've implemented that using two nested map_fn accompanied with a lot of padding/stacking/reshaping. I expect I could just use a for loop for that, no idea what impact this would have on performance.

guschmue · 2021-02-18T21:12:19Z

Should be possible to add support for CombinedNonMaxSuppression since the onnx nms op supports batch.
We actually unsqueeze the input to get a batch size of 1.

jan-golda · 2021-02-22T09:14:33Z

@guschmue nice to hear that!

Do you think it would be possible to implement that in the near future?

TomWildenhain-Microsoft · 2021-02-22T18:16:12Z

Yep, I'm working on it! Hopefully by end of the week.

TomWildenhain-Microsoft · 2021-02-27T06:07:47Z

What is the dimension of the scores tensor for your model? TF has 2 different behaviors (are you sharing boxes across classes?)

jan-golda · 2021-03-01T20:38:12Z

Sorry for the late reply!

There are two separate places in the code where this op is used. Below you will find some example shapes for these two places:

place 1:
  boxes:  [4, 1000, 90, 4]
  scores: [4, 1000, 90]
place 2:
  boxes:  [4, 209664, 1, 4]
  scores: [4, 209664, 1]

So the answer is no - I am not sharing the boxes across classes since the third dimension of boxes always equals the third dimension of scores

TomWildenhain-Microsoft · 2021-03-01T22:08:16Z

Well... just finished implementing it for the other version: #1376. Non-sharing is a little harder since ONNX does share boxes across classes for NMS. I could just make the boxes shared and zero-out the score for all but one class, but that will be a lot of zeros (90 per box, so 90 * 4 * 1000 * 90 = 32 million (probably too large). Place 2 should work fine with the current implementation since there is only 1 class. For place 1, what is the max_outputs and max outputs per class?

jan-golda · 2021-03-02T14:36:30Z

max_output_size_per_class=1000,
max_total_size=100

TomWildenhain-Microsoft · 2021-03-03T20:52:42Z

For experimental purposes, can you try testing the performance of the CombinedNonMaxSuppression implementation I've done so far? Making boxes non-shared will be a decent bit harder but have similar perf so it would be nice to know if the perf is sufficient. Just add a slice before your CombinedNonMaxSuppression to cut the dim from 90 to 1 and see how the perf of the conversion to ONNX compares to TF.

If the perf is not good, we may have to use a custom op or try a different implementation approach.

guschmue · 2021-04-07T14:46:17Z

assume this is resolved.

PINTO0309 · 2022-06-13T00:13:54Z

Pure ONNX Multi-Class NonMaximumSuppression, CombinedNonMaxSuppression.
https://github.com/PINTO0309/yolact_edge_onnx_tensorrt_myriad

hwangdeyu · 2022-06-14T06:42:26Z

Pure ONNX Multi-Class NonMaximumSuppression, CombinedNonMaxSuppression. https://github.com/PINTO0309/yolact_edge_onnx_tensorrt_myriad

What a cool job！☺

Kimyuhwanpeter · 2024-03-29T03:18:26Z

@hwangdeyu I used cv::dnn::NMSbox. I can share my work. but this is onnxruntime C++ not tflite.
For tensorflow yolov8, i modifed the code like below (for python)
i didn't include NMS in python but onnxruntim C++

python

import tensorflow as tf
import keras_cv
import keras

import model
import config as CONFIG
import loss_v2 as lo
import tf2onnx

from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
from keras_cv.src.backend import ops


if __name__ == "__main__":
    # h5 to SavedModel
    print(r"h5 -> SavedModel")
    nms = keras_cv.layers.NonMaxSuppression(bounding_box_format=CONFIG.CONFIG.box_format,
                                    from_logits=False,
                                    iou_threshold=CONFIG.CONFIG.iou,
                                    confidence_threshold=CONFIG.CONFIG.conf,
                                    max_detections=CONFIG.CONFIG.max_detect)    
    backbone = keras_cv.models.YOLOV8Backbone.from_preset(  # 여기서 백본은 마음것 고치면 될듯!!!!
        "yolo_v8_l_backbone_coco")
    yolo = model.yolov8_model(backbone, CONFIG.CONFIG.nc, CONFIG.CONFIG.box_format)

    yolo.load_weights(CONFIG.CONFIG.save_path + "/yolov8_halmet_color.h5")        
    preds = yolo.outputs[0]
    yolo.outputs[0] = tf.reshape(preds, 
                            [-1, 4, CONFIG.CONFIG.BOX_REGRESSION_CHANNELS // 4])
    yolo.outputs[0] = tf.linalg.matmul(ops.nn.softmax(yolo.outputs[0], axis=-1),
                ops.arange(CONFIG.CONFIG.BOX_REGRESSION_CHANNELS // 4, dtype="float32")[..., None])
    yolo.outputs[0] = tf.squeeze(yolo.outputs[0], -1)

    anchor_points, stride_tensor = lo.get_anchors(image_shape=(CONFIG.CONFIG.img_size, CONFIG.CONFIG.img_size))
    stride_tensor = ops.expand_dims(stride_tensor, axis=-1)

    yolo.outputs[0] = lo.dist2bbox(yolo.outputs[0], anchor_points) * stride_tensor  # box shape is problem?!?!?!?!?


    yolo = tf.keras.Model(inputs=yolo.inputs, outputs=yolo.outputs)
    yolo.summary()
    tf2onnx.convert.from_keras(yolo,
                                   
    output_path="/yhkim/yhkim/yuhwan_project/CustomNew_tensor_detection/v1/checkpoint/SavedModel/model.onnx",
                           opset=13)

    yolo.save(filepath=CONFIG.CONFIG.savedmodel_path, save_format='tf')

onnxruntime C++

std::vector<Detection> LDetector::postprocessing(const cv::Size& resizedImageShape,
    const cv::Size& originalImageShape,
    std::vector<Ort::Value>& outputTensors,
    const float& confThreshold, const float& iouThreshold)
{
    // Get the output tensor data and shape
    auto* rawOutputBoxes = outputTensors[0].GetTensorData<float>(); 
    std::vector<int64_t> outputShapeBoxes = outputTensors[0].GetTensorTypeAndShapeInfo().GetShape();
    size_t countBoxes = outputTensors[0].GetTensorTypeAndShapeInfo().GetElementCount();

    auto* rawOutputcls = outputTensors[1].GetTensorData<float>();
    std::vector<int64_t> outputShapeClasses = outputTensors[1].GetTensorTypeAndShapeInfo().GetShape();
    size_t countClass = outputTensors[1].GetTensorTypeAndShapeInfo().GetElementCount();

    std::vector<float> outputs(rawOutputBoxes, rawOutputBoxes + countBoxes);
    std::vector<float> CalsOutputs(rawOutputcls, rawOutputcls + countClass);

    // reshape outputs
    std::vector<std::vector<std::vector<float>>> outputs_reshaped(1, std::vector<std::vector<float>>(8400, std::vector<float>(4, 0.0f)));
    std::vector<std::vector<std::vector<float>>> cles_reshaped(1, std::vector<std::vector<float>>(8400, std::vector<float>(2, 0.0f)));

    std::vector<BoundingBox> outputs_reshaped_new(8400);
    std::vector<int> predClassIds;
    std::vector<cv::Rect> predBoxes;
    std::vector<float> predConfidences;

    for (int i = 0; i < 8400; i++) {
        for (int j = 0; j < 4; j++) {          
            if (j >= 0 && j < 2) {
                cles_reshaped[0][i][j] = CalsOutputs[i * 2 + j];
                outputs_reshaped[0][i][j] = outputs[i * 4 + j];
            }
            else
                outputs_reshaped[0][i][j] = outputs[i * 4 + j];
        }

        outputs_reshaped_new[i].x1 = outputs_reshaped[0][i][0];
        outputs_reshaped_new[i].y1 = outputs_reshaped[0][i][1];
        outputs_reshaped_new[i].x2 = outputs_reshaped[0][i][2];
        outputs_reshaped_new[i].y2 = outputs_reshaped[0][i][3];

        auto max_value_it = std::max_element(cles_reshaped[0][i].begin(), cles_reshaped[0][i].end());
        outputs_reshaped_new[i].score = *max_value_it;
        outputs_reshaped_new[i].classId = std::distance(cles_reshaped[0][i].begin(), max_value_it);

        float xmin = outputs_reshaped[0][i][0];
        float ymin = outputs_reshaped[0][i][1];
        float xmax = outputs_reshaped[0][i][2];
        float ymax = outputs_reshaped[0][i][3];
        float width = xmax - xmin;
        float height = ymax - ymin;

        float x = max(0, min(xmin, this->inputImageShape.width - 1));
        float y = max(0, min(ymin, this->inputImageShape.height - 1));
        width = max(0, min(width, this->inputImageShape.width - x));
        height = max(0, min(height, this->inputImageShape.height - y));
        predBoxes.emplace_back(x, y, width, height);
        predClassIds.push_back(std::distance(cles_reshaped[0][i].begin(), max_value_it));
        predConfidences.push_back(static_cast<float>(*max_value_it));

    }
    std::vector<Detection> detections;
    std::vector<int> indices;
    cv::dnn::NMSBoxes(predBoxes, predConfidences, confThreshold, iouThreshold, indices, 1.0);
    for (int idx : indices) {
        if (predBoxes[idx].width > 0. && predBoxes[idx].height > 0.) {
            Detection det;
            det.box = cv::Rect(predBoxes[idx]);
            //utils::scaleCoords(resizedImageShape, det.box, originalImageShape);

            det.conf = predConfidences[idx];
            det.classId = predClassIds[idx];
            detections.emplace_back(det);
        }
    }

    return detections;
}

it work fine.
I hope it will be of help

TomWildenhain-Microsoft self-assigned this Feb 26, 2021

TomWildenhain-Microsoft added the pending on user response Waiting for more information or validation from user label Mar 5, 2021

guschmue closed this as completed Apr 7, 2021

cflavsAmbev mentioned this issue Jun 26, 2024

Inference on ONNX YOLOv8 model keras-team/keras-cv#2460

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CombinedNonMaxSuppression is not supported in ONNX #1337

CombinedNonMaxSuppression is not supported in ONNX #1337

jan-golda commented Feb 12, 2021

TomWildenhain-Microsoft commented Feb 18, 2021

jan-golda commented Feb 18, 2021

guschmue commented Feb 18, 2021

jan-golda commented Feb 22, 2021

TomWildenhain-Microsoft commented Feb 22, 2021

TomWildenhain-Microsoft commented Feb 27, 2021

jan-golda commented Mar 1, 2021

TomWildenhain-Microsoft commented Mar 1, 2021

jan-golda commented Mar 2, 2021

TomWildenhain-Microsoft commented Mar 3, 2021

guschmue commented Apr 7, 2021

PINTO0309 commented Jun 13, 2022

hwangdeyu commented Jun 14, 2022

Kimyuhwanpeter commented Mar 29, 2024 •

edited

Loading

CombinedNonMaxSuppression is not supported in ONNX #1337

CombinedNonMaxSuppression is not supported in ONNX #1337

Comments

jan-golda commented Feb 12, 2021

TomWildenhain-Microsoft commented Feb 18, 2021

jan-golda commented Feb 18, 2021

guschmue commented Feb 18, 2021

jan-golda commented Feb 22, 2021

TomWildenhain-Microsoft commented Feb 22, 2021

TomWildenhain-Microsoft commented Feb 27, 2021

jan-golda commented Mar 1, 2021

TomWildenhain-Microsoft commented Mar 1, 2021

jan-golda commented Mar 2, 2021

TomWildenhain-Microsoft commented Mar 3, 2021

guschmue commented Apr 7, 2021

PINTO0309 commented Jun 13, 2022

hwangdeyu commented Jun 14, 2022

Kimyuhwanpeter commented Mar 29, 2024 • edited Loading

Kimyuhwanpeter commented Mar 29, 2024 •

edited

Loading