Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dynamic batch / Dynamic shape] onnx model with dynamic input is converted to tflite with static input 1 #441

Closed
mikel-brostrom opened this issue Aug 4, 2023 · 17 comments
Labels
Dynamic batch / Dynamic shape Dynamic batch / Dynamic shape Environment Environment third party Third-party tool issues

Comments

@mikel-brostrom
Copy link
Contributor

mikel-brostrom commented Aug 4, 2023

Issue Type

Others

OS

Linux

onnx2tf version number

1.15.8

onnx version number

1.13.0

onnxruntime version number

1.13.1

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.13.0

Download URL for ONNX

osnet_x0_25_msmt17.zip

Parameter Replacement JSON

NA

Description

Hi @PINTO0309!

I have the following issue

ONNX input:

Screenshot from 2023-08-04 11-52-46

TFLite (FP32 model) input:
Screenshot from 2023-08-04 11-57-24

after conversion by: onnx2tf -i examples/weights/osnet_x0_25_msmt17.onnx -o /home/mikel.brostrom/yolo_tracking/examples/weights/osnet_x0_25_msmt17_saved_model -nuo --non_verbose

I went through the README but could find any reason behind this behavior. -b 10 works as expected but my input varies depending on the image so the input needs to be dynamic. Output size is also set to the static input value.

@PINTO0309 PINTO0309 added the third party Third-party tool issues label Aug 4, 2023
@PINTO0309
Copy link
Owner

PINTO0309 commented Aug 4, 2023

There is no problem with the model conversion operation itself. That is the problem with Netron's graphical display feature. The evidence is presented below.

  • Step.1

    onnx2tf -i osnet_x0_25_msmt17.onnx -osd --non_verbose
    • tflite
      When viewing tflite in Netron, the batch size appears to be fixed at 1.
      image
    • saved_model
      However, checking the structure of saved_model, the batch size is correctly set to -1.
      saved_model_cli show --dir saved_model/ --all
      
      MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
      
      signature_def['__saved_model_init_op']:
        The given SavedModel SignatureDef contains the following input(s):
        The given SavedModel SignatureDef contains the following output(s):
          outputs['__saved_model_init_op'] tensor_info:
              dtype: DT_INVALID
              shape: unknown_rank
              name: NoOp
        Method name is: 
      
      signature_def['serving_default']:
        The given SavedModel SignatureDef contains the following input(s):
          inputs['images'] tensor_info:
              dtype: DT_FLOAT
              shape: (-1, 256, 128, 3)
              name: serving_default_images:0
        The given SavedModel SignatureDef contains the following output(s):
          outputs['output'] tensor_info:
              dtype: DT_FLOAT
              shape: (-1, 512)
              name: PartitionedCall:0
        Method name is: tensorflow/serving/predict
  • Step.2
    To prove that the tflite structure has been converted correctly, I will convert the tflite to JSON and look at the structure.

    docker run --rm -it \
    -v `pwd`:/home/user/workdir \
    ghcr.io/pinto0309/tflite2json2tflite:latest
    
    ./flatc -t \
    --strict-json \
    --defaults-json \
    -o workdir \
    ./schema.fbs -- workdir/saved_model/osnet_x0_25_msmt17_float32.tflite
    
    ls -l workdir
    
    -rw-rw-r-- 1 user user   921564 Aug  4 10:24 osnet_x0_25_msmt17.onnx
    -rw-r--r-- 1 user user 10369524 Aug  4 10:30 osnet_x0_25_msmt17_float32.json
    drwxrwxr-x 4 user user     4096 Aug  4 10:26 saved_model

    image

    • osnet_x0_25_msmt17_float32.json
      "shape_signature" is correctly set to -1. However, "shape" is set to 1. This could be a problem with TFLiteConverter, or it could be a problem with Netron's graphical display capabilities.
      image

In other words, although onnx2tf converts TFLiteConverer as specified, with the batch size of -1 without any model processing, only Netron's display is broken. This is a problem I have known for quite some time. However, the inference itself does not cause the problem. The strings and values ultimately written to tflite (Flatbuffers) are uncontrollable from onnx2tf.

@mikel-brostrom
Copy link
Contributor Author

Thank you so much for your rapid reply and your time once again 😄

@mikel-brostrom
Copy link
Contributor Author

mikel-brostrom commented Aug 4, 2023

Yup, I printed self.interpreter.get_input_details() and got:

[{'name': 'inputs_0', 'index': 0, 'shape': array([  1, 256, 128,   3], dtype=int32), 'shape_signature': array([ -1, 256, 128,   3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

I guess there is something weird going on in TFLiteConverter. Can run ONNX with dynamic inputs but the TFLite one crashes...
[ 1, 256, 128, 3] is the dummy input I use to torch.onnx.export

@PINTO0309
Copy link
Owner

PINTO0309 commented Aug 4, 2023

If you want to infer in variable batches, you need to infer using signature. In such cases, the -coion option must be specified when converting the model. Note that I have identified a problem with quantization with the -coion option, which can corrupt tflite files. #429

'shape_signature': array([ -1, 256, 128, 3], dtype=int32)
interpreter.get_signature_runner()

https://github.com/PINTO0309/onnx2tf#4-match-tflite-inputoutput-names-and-inputoutput-order-to-onnx

  • convert
    onnx2tf -i osnet_x0_25_msmt17.onnx -osd -coion --non_verbose
    image
  • test.py - Batch size: 5
    import numpy as np
    import tensorflow as tf
    from pprint import pprint
    
    interpreter = tf.lite.Interpreter(model_path="saved_model/osnet_x0_25_msmt17_float32.tflite")
    tf_lite_model = interpreter.get_signature_runner()
    inputs = {
        'images': np.ones([5,256,128,3], dtype=np.float32),
    }
    tf_lite_output = tf_lite_model(**inputs)
    print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
    print(f"[TFLite] Model Predictions:")
    pprint(tf_lite_output)
  • results
    [TFLite] Model Predictions shape: (5, 512)
    [TFLite] Model Predictions:
    {'output': array([[0.0000000e+00, 2.4730086e-04, 0.0000000e+00, ..., 1.0528549e+00,
            3.7874988e-01, 0.0000000e+00],
           [0.0000000e+00, 2.4730086e-04, 0.0000000e+00, ..., 1.0528549e+00,
            3.7874988e-01, 0.0000000e+00],
           [0.0000000e+00, 2.4730086e-04, 0.0000000e+00, ..., 1.0528549e+00,
            3.7874988e-01, 0.0000000e+00],
           [0.0000000e+00, 2.4730086e-04, 0.0000000e+00, ..., 1.0528549e+00,
            3.7874988e-01, 0.0000000e+00],
           [0.0000000e+00, 2.4730084e-04, 0.0000000e+00, ..., 1.0528525e+00,
            3.7874976e-01, 0.0000000e+00]], dtype=float32)}
    
  • test.py - Batch size: 3
    import numpy as np
    import tensorflow as tf
    from pprint import pprint
    
    interpreter = tf.lite.Interpreter(model_path="saved_model/osnet_x0_25_msmt17_float32.tflite")
    tf_lite_model = interpreter.get_signature_runner()
    inputs = {
        'images': np.ones([3,256,128,3], dtype=np.float32),
    }
    tf_lite_output = tf_lite_model(**inputs)
    print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
    print(f"[TFLite] Model Predictions:")
    pprint(tf_lite_output)
  • results
    [TFLite] Model Predictions shape: (3, 512)
    [TFLite] Model Predictions:
    {'output': array([[0.0000000e+00, 2.4730084e-04, 0.0000000e+00, ..., 1.0528525e+00,
            3.7874976e-01, 0.0000000e+00],
           [0.0000000e+00, 2.4730084e-04, 0.0000000e+00, ..., 1.0528525e+00,
            3.7874976e-01, 0.0000000e+00],
           [0.0000000e+00, 2.4730084e-04, 0.0000000e+00, ..., 1.0528525e+00,
            3.7874976e-01, 0.0000000e+00]], dtype=float32)}
    

@PINTO0309 PINTO0309 added the Dynamic batch / Dynamic shape Dynamic batch / Dynamic shape label Aug 4, 2023
@PINTO0309 PINTO0309 changed the title onnx model with dynamic input is converted to tflite with static input 1 [Dynamic Batch / Dynamic Shape] onnx model with dynamic input is converted to tflite with static input 1 Aug 4, 2023
@PINTO0309 PINTO0309 changed the title [Dynamic Batch / Dynamic Shape] onnx model with dynamic input is converted to tflite with static input 1 [Dynamic batch / Dynamic shape] onnx model with dynamic input is converted to tflite with static input 1 Aug 4, 2023
@mikel-brostrom
Copy link
Contributor Author

mikel-brostrom commented Aug 4, 2023

onnx2tf -i examples/weights/osnet_x0_25_msmt17.onnx -o /home/mikel.brostrom/yolo_tracking/examples/weights/osnet_x0_25_msmt17_saved_model -osd -coion --non_verbose

works, no problem. But when I run:

import numpy as np
import tensorflow as tf
from pprint import pprint

interpreter = tf.lite.Interpreter(model_path="/home/mikel.brostrom/yolo_tracking/examples/weights/osnet_x0_25_msmt17_saved_model/osnet_x0_25_msmt17_float32.tflite")
tf_lite_model = interpreter.get_signature_runner()
inputs = {
    'images': np.ones([5,256,128,3], dtype=np.float32),
}
tf_lite_output = tf_lite_model(**inputs)
print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
print(f"[TFLite] Model Predictions:")
pprint(tf_lite_output)

I get:

lite/python/interpreter.py", line 853, in get_signature_runner
    raise ValueError(
ValueError: SignatureDef signature_key is None and model has 0 Signatures. None is only allowed when the model has 1 SignatureDef

Should this be added manually?

@PINTO0309
Copy link
Owner

Are all necessary packages installed? flatbuffers-compiler
https://github.com/PINTO0309/onnx2tf#environment

If it doesn't work, try Docker.

docker run --rm -it \
-v `pwd`:/workdir \
-w /workdir \
docker.io/pinto0309/onnx2tf:1.15.8

@mikel-brostrom
Copy link
Contributor Author

Yup, installed all the packages mentioned in README (flatbuffers-compiler included). Will try docker

@mikel-brostrom
Copy link
Contributor Author

mikel-brostrom commented Aug 4, 2023

I get the same issue there:

user@69584e9dc119:/workdir$ python examples/weights/test.py 
Traceback (most recent call last):
  File "examples/weights/test.py", line 6, in <module>
    tf_lite_model = interpreter.get_signature_runner()
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/lite/python/interpreter.py", line 853, in get_signature_runner
    raise ValueError(
ValueError: SignatureDef signature_key is None and model has 0 Signatures. None is only allowed when the model has 1 SignatureDef
user@69584e9dc119:/workdir$

test.py contains:

import numpy as np
import tensorflow as tf
from pprint import pprint

interpreter = tf.lite.Interpreter(model_path="/workdir/examples/weights/osnet_x0_25_msmt17_saved_model/osnet_x0_25_msmt17_float32.tflite")
tf_lite_model = interpreter.get_signature_runner()
inputs = {
    'images': np.ones([5,256,128,3], dtype=np.float32),
}
tf_lite_output = tf_lite_model(**inputs)
print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
print(f"[TFLite] Model Predictions:")
pprint(tf_lite_output)

@mikel-brostrom
Copy link
Contributor Author

Have to catch a train. So will have to continue looking at this later today 😄

@PINTO0309
Copy link
Owner

PINTO0309 commented Aug 4, 2023

What is examples/weights/tttt.py?
Please describe the exact command you executed.

Once the conversion is performed in the Docker container, there should be no errors. Also, if you are running test.py correctly, no error can occur. Be sure to check that the file path is the correct path. The problems you have are unique to your environment.

docker run --rm -it \
-v `pwd`:/workdir \
-w /workdir \
docker.io/pinto0309/onnx2tf:1.15.8

onnx2tf \
-i osnet_x0_25_msmt17.onnx \
-o saved_model \
-osd \
-coion \
--non_verbose

@mikel-brostrom
Copy link
Contributor Author

The onnx2tf command is not failing. What is falling is the inference, in test.py.

@PINTO0309
Copy link
Owner

I know that from the beginning.

Concerned that your host PC environment was corrupted at the time of converting the model. Please redo everything in Docker.

@PINTO0309 PINTO0309 added the Environment Environment label Aug 4, 2023
@mikel-brostrom
Copy link
Contributor Author

mikel-brostrom commented Aug 4, 2023

Thanks for your patience. Will try Docker later today from scratch :)

@mikel-brostrom
Copy link
Contributor Author

mikel-brostrom commented Aug 4, 2023

Everything that could go wrong went wrong 🤣. My bad with the environment. Have it working after your suggestions:

tflite model input torch.Size([1, 256, 128, 3])
tflite model output (1, 512)
0: 480x640 1 person, 9.1ms

tflite model input torch.Size([2, 256, 128, 3])
tflite model output (2, 512)
0: 480x640 1 person, 1 chair, 9.5ms

tflite model input torch.Size([2, 256, 128, 3])
tflite model output (2, 512)
0: 480x640 1 person, 1 chair, 15.5ms

Will use the provided docker from now on when doing onnx2tf stuff 😄

@PINTO0309
Copy link
Owner

Glad to hear it went well.

@PINTO0309
Copy link
Owner

I have added it to the README and will close it.

@mikel-brostrom
Copy link
Contributor Author

Great tutorial for dynamic batch inference using TFLite models! It was much needed IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dynamic batch / Dynamic shape Dynamic batch / Dynamic shape Environment Environment third party Third-party tool issues
Projects
None yet
Development

No branches or pull requests

2 participants