Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CombinedNonMaxSuppression is not supported in ONNX #1337

Closed
jan-golda opened this issue Feb 12, 2021 · 14 comments
Closed

CombinedNonMaxSuppression is not supported in ONNX #1337

jan-golda opened this issue Feb 12, 2021 · 14 comments
Assignees
Labels
pending on user response Waiting for more information or validation from user

Comments

@jan-golda
Copy link

Describe the bug
I am trying to convert MaskRCNN in TensorFlow 2 to ONNX but it is failing due to CombinedNonMaxSuppression op being not supported in ONNX.

Urgency
It is blocking the use of MaskRCNN in ONNX.

System information

  • OS Platform and Distribution: Linux Ubuntu 20.04.1
  • Tensorflow Version: 2.3.1
  • ONNX version: 1.8.0
  • Python version: 3.8.5

To Reproduce
Try to convert any model that uses tf.image.combined_non_max_suppression

Expected behaviour
A model should be converted without a fail on CombinedNonMaxSuppression op.

Additional context
This was already reported in #847 for YOLO, and I have tried to apply the workaround from there - replace CombinedNonMaxSuppression with a NonMaxSuppression accompanied by a set of ops that were meant to replace the "Combined" part.

I have tried to get it to work for a few days, but in the case of MaskRCNN, this seems to be more complicated than in the case of YOLO. I had to apply the NonMaxSuppression for each class in each sample from batch separately, then pad it, select the top results for a class, then select the top results for a box, retrieve information about classes and scores, pad it again and finally gather results across the batch.

There is a reason why this op was added to TF, as recreating it from scratch is quite complicated, therefore I would like to ask if you could add the support for it in the ONNX.

Moreover, when I was running the model with partially applied changes I was observing significant performance drop when running it with automatic mixed-precision - in short: replacing the CombinedNonMaxSuppression has a noticeable impact on the original TF model, which is not ideal.

@TomWildenhain-Microsoft
Copy link
Contributor

Hi @jan-golda to convert this op, we can either 1) compose it out of ONNX ops or 2) implement it as a custom op. From your experimenting in TF is seems like you are finding making a compositions of ops to be expensive. We have previously made relatively complicated ops out of compositions of ops with decent performance, but sometimes it can't be done. In this case, I suspect we can get an efficient composition. Are you using any loops in your implementation, or are you using only batched tensor ops + Gather?

@jan-golda
Copy link
Author

Hi @TomWildenhain-Microsoft
Well, I was trying to use the NonMaxSuppression, but since it does no support batching nor class-wise suppression I had to iterate over both the batch and the classes.
I've implemented that using two nested map_fn accompanied with a lot of padding/stacking/reshaping. I expect I could just use a for loop for that, no idea what impact this would have on performance.

@guschmue
Copy link
Contributor

Should be possible to add support for CombinedNonMaxSuppression since the onnx nms op supports batch.
We actually unsqueeze the input to get a batch size of 1.

@jan-golda
Copy link
Author

@guschmue nice to hear that!

Do you think it would be possible to implement that in the near future?

@TomWildenhain-Microsoft
Copy link
Contributor

Yep, I'm working on it! Hopefully by end of the week.

@TomWildenhain-Microsoft
Copy link
Contributor

What is the dimension of the scores tensor for your model? TF has 2 different behaviors (are you sharing boxes across classes?)

@jan-golda
Copy link
Author

Sorry for the late reply!

There are two separate places in the code where this op is used. Below you will find some example shapes for these two places:

place 1:
  boxes:  [4, 1000, 90, 4]
  scores: [4, 1000, 90]
place 2:
  boxes:  [4, 209664, 1, 4]
  scores: [4, 209664, 1]

So the answer is no - I am not sharing the boxes across classes since the third dimension of boxes always equals the third dimension of scores

@TomWildenhain-Microsoft
Copy link
Contributor

Well... just finished implementing it for the other version: #1376. Non-sharing is a little harder since ONNX does share boxes across classes for NMS. I could just make the boxes shared and zero-out the score for all but one class, but that will be a lot of zeros (90 per box, so 90 * 4 * 1000 * 90 = 32 million (probably too large). Place 2 should work fine with the current implementation since there is only 1 class. For place 1, what is the max_outputs and max outputs per class?

@jan-golda
Copy link
Author

max_output_size_per_class=1000,
max_total_size=100

@TomWildenhain-Microsoft
Copy link
Contributor

For experimental purposes, can you try testing the performance of the CombinedNonMaxSuppression implementation I've done so far? Making boxes non-shared will be a decent bit harder but have similar perf so it would be nice to know if the perf is sufficient. Just add a slice before your CombinedNonMaxSuppression to cut the dim from 90 to 1 and see how the perf of the conversion to ONNX compares to TF.

If the perf is not good, we may have to use a custom op or try a different implementation approach.

@TomWildenhain-Microsoft TomWildenhain-Microsoft added the pending on user response Waiting for more information or validation from user label Mar 5, 2021
@guschmue
Copy link
Contributor

guschmue commented Apr 7, 2021

assume this is resolved.

@guschmue guschmue closed this as completed Apr 7, 2021
@PINTO0309
Copy link

Pure ONNX Multi-Class NonMaximumSuppression, CombinedNonMaxSuppression.
https://github.com/PINTO0309/yolact_edge_onnx_tensorrt_myriad
image
image

@hwangdeyu
Copy link
Contributor

Pure ONNX Multi-Class NonMaximumSuppression, CombinedNonMaxSuppression. https://github.com/PINTO0309/yolact_edge_onnx_tensorrt_myriad image image

What a cool job!☺

@Kimyuhwanpeter
Copy link

Kimyuhwanpeter commented Mar 29, 2024

@hwangdeyu I used cv::dnn::NMSbox. I can share my work. but this is onnxruntime C++ not tflite.
For tensorflow yolov8, i modifed the code like below (for python)
i didn't include NMS in python but onnxruntim C++

python

import tensorflow as tf
import keras_cv
import keras

import model
import config as CONFIG
import loss_v2 as lo
import tf2onnx

from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
from keras_cv.src.backend import ops


if __name__ == "__main__":
    # h5 to SavedModel
    print(r"h5 -> SavedModel")
    nms = keras_cv.layers.NonMaxSuppression(bounding_box_format=CONFIG.CONFIG.box_format,
                                    from_logits=False,
                                    iou_threshold=CONFIG.CONFIG.iou,
                                    confidence_threshold=CONFIG.CONFIG.conf,
                                    max_detections=CONFIG.CONFIG.max_detect)    
    backbone = keras_cv.models.YOLOV8Backbone.from_preset(  # 여기서 백본은 마음것 고치면 될듯!!!!
        "yolo_v8_l_backbone_coco")
    yolo = model.yolov8_model(backbone, CONFIG.CONFIG.nc, CONFIG.CONFIG.box_format)

    yolo.load_weights(CONFIG.CONFIG.save_path + "/yolov8_halmet_color.h5")        
    preds = yolo.outputs[0]
    yolo.outputs[0] = tf.reshape(preds, 
                            [-1, 4, CONFIG.CONFIG.BOX_REGRESSION_CHANNELS // 4])
    yolo.outputs[0] = tf.linalg.matmul(ops.nn.softmax(yolo.outputs[0], axis=-1),
                ops.arange(CONFIG.CONFIG.BOX_REGRESSION_CHANNELS // 4, dtype="float32")[..., None])
    yolo.outputs[0] = tf.squeeze(yolo.outputs[0], -1)

    anchor_points, stride_tensor = lo.get_anchors(image_shape=(CONFIG.CONFIG.img_size, CONFIG.CONFIG.img_size))
    stride_tensor = ops.expand_dims(stride_tensor, axis=-1)

    yolo.outputs[0] = lo.dist2bbox(yolo.outputs[0], anchor_points) * stride_tensor  # box shape is problem?!?!?!?!?


    yolo = tf.keras.Model(inputs=yolo.inputs, outputs=yolo.outputs)
    yolo.summary()
    tf2onnx.convert.from_keras(yolo,
                                   
    output_path="/yhkim/yhkim/yuhwan_project/CustomNew_tensor_detection/v1/checkpoint/SavedModel/model.onnx",
                           opset=13)

    yolo.save(filepath=CONFIG.CONFIG.savedmodel_path, save_format='tf')

onnxruntime C++

std::vector<Detection> LDetector::postprocessing(const cv::Size& resizedImageShape,
    const cv::Size& originalImageShape,
    std::vector<Ort::Value>& outputTensors,
    const float& confThreshold, const float& iouThreshold)
{
    // Get the output tensor data and shape
    auto* rawOutputBoxes = outputTensors[0].GetTensorData<float>(); 
    std::vector<int64_t> outputShapeBoxes = outputTensors[0].GetTensorTypeAndShapeInfo().GetShape();
    size_t countBoxes = outputTensors[0].GetTensorTypeAndShapeInfo().GetElementCount();

    auto* rawOutputcls = outputTensors[1].GetTensorData<float>();
    std::vector<int64_t> outputShapeClasses = outputTensors[1].GetTensorTypeAndShapeInfo().GetShape();
    size_t countClass = outputTensors[1].GetTensorTypeAndShapeInfo().GetElementCount();

    std::vector<float> outputs(rawOutputBoxes, rawOutputBoxes + countBoxes);
    std::vector<float> CalsOutputs(rawOutputcls, rawOutputcls + countClass);

    // reshape outputs
    std::vector<std::vector<std::vector<float>>> outputs_reshaped(1, std::vector<std::vector<float>>(8400, std::vector<float>(4, 0.0f)));
    std::vector<std::vector<std::vector<float>>> cles_reshaped(1, std::vector<std::vector<float>>(8400, std::vector<float>(2, 0.0f)));

    std::vector<BoundingBox> outputs_reshaped_new(8400);
    std::vector<int> predClassIds;
    std::vector<cv::Rect> predBoxes;
    std::vector<float> predConfidences;

    for (int i = 0; i < 8400; i++) {
        for (int j = 0; j < 4; j++) {          
            if (j >= 0 && j < 2) {
                cles_reshaped[0][i][j] = CalsOutputs[i * 2 + j];
                outputs_reshaped[0][i][j] = outputs[i * 4 + j];
            }
            else
                outputs_reshaped[0][i][j] = outputs[i * 4 + j];
        }

        outputs_reshaped_new[i].x1 = outputs_reshaped[0][i][0];
        outputs_reshaped_new[i].y1 = outputs_reshaped[0][i][1];
        outputs_reshaped_new[i].x2 = outputs_reshaped[0][i][2];
        outputs_reshaped_new[i].y2 = outputs_reshaped[0][i][3];

        auto max_value_it = std::max_element(cles_reshaped[0][i].begin(), cles_reshaped[0][i].end());
        outputs_reshaped_new[i].score = *max_value_it;
        outputs_reshaped_new[i].classId = std::distance(cles_reshaped[0][i].begin(), max_value_it);

        float xmin = outputs_reshaped[0][i][0];
        float ymin = outputs_reshaped[0][i][1];
        float xmax = outputs_reshaped[0][i][2];
        float ymax = outputs_reshaped[0][i][3];
        float width = xmax - xmin;
        float height = ymax - ymin;

        float x = max(0, min(xmin, this->inputImageShape.width - 1));
        float y = max(0, min(ymin, this->inputImageShape.height - 1));
        width = max(0, min(width, this->inputImageShape.width - x));
        height = max(0, min(height, this->inputImageShape.height - y));
        predBoxes.emplace_back(x, y, width, height);
        predClassIds.push_back(std::distance(cles_reshaped[0][i].begin(), max_value_it));
        predConfidences.push_back(static_cast<float>(*max_value_it));

    }
    std::vector<Detection> detections;
    std::vector<int> indices;
    cv::dnn::NMSBoxes(predBoxes, predConfidences, confThreshold, iouThreshold, indices, 1.0);
    for (int idx : indices) {
        if (predBoxes[idx].width > 0. && predBoxes[idx].height > 0.) {
            Detection det;
            det.box = cv::Rect(predBoxes[idx]);
            //utils::scaleCoords(resizedImageShape, det.box, originalImageShape);

            det.conf = predConfidences[idx];
            det.classId = predClassIds[idx];
            detections.emplace_back(det);
        }
    }

    return detections;
}

it work fine.
I hope it will be of help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending on user response Waiting for more information or validation from user
Projects
None yet
Development

No branches or pull requests

6 participants