-
Notifications
You must be signed in to change notification settings - Fork 965
Error in tinyyolo conversion from Coreml to Caffe (Mxnet) Different way of padding
Model: Tiny YOLO network from the paper 'YOLO9000: Better, Faster, Stronger' (2016), arXiv:1612.08242
Source: COREML
Destination: Caffe / Mxnet
Author: Jiahao
We test the coreml parser and caffe / mxnet Emitter, using the same weights in every layer.
The outputs of coreml model have the shape of 13x13x125.
Nevertheless, the outputs of caffe / mxnet model have the shape of 12x14x125 or 14x14x125 or 14x12x125.
It's a big error although the converted model can also run smoothly and the code seems to be error-proof.
First, download the coreml tinyyolo model.
$ mmdownload -f coreml -n tinyyolo
Secondly, convert the coreml model to IR structure.
$ mmtoir -f coreml -d tinyyolo -n TinyYOLO.mlmodel --dstNodeName MMdnn_Output
You will get
IR network structure is saved as [tinyyolo.json].
IR network structure is saved as [tinyyolo.pb].
IR weights are saved as [tinyyolo.npy].
Finally, convert the IR to mxnet code.
$ mmtocode -f mxnet --IRModelPath tinyyolo.pb --IRWeightPath tinyyolo.npy --dstModelPath mx_tinyyolo.py --dstWeightPath mx_tinyyolo-0000.param
Then, the Mxnet network code snippet is saved as [mx_tinyyolo.py].
In the line 33 of mx_tinyyolo.py
is maxpooling with stride 2
maxpooling2d_6 = mx.sym.Pooling(data = leakyrelu_6, global_pool = False, kernel=(2L, 2L), pool_type = 'max', stride=(1L, 1L), pad=(0L, 0L), name = 'maxpooling2d_6')
Originally in the coreml model, the input of this layer has shape of 13x13, and the output of this layer is supposed to be 13x13. To be more specific, the padding of this layer is supposed to be padding_left=1, padding_right=0, padding_top=1, padding_bottom=0
. However, in caffe and coreml model, the padding has to be symmetric, which means padding_left
has to be equal to padding_right
and padding_top
has to be equal to padding_bottom
. Therefore, the output of this layer can never be 13x13, but 12x14 or 14x12 or 14x14, which can be seen more clearly from the images below.
Different way of padding results in different shape after pooling.
Since the paddings in caffe and mxnet are symmetric, the shapes after this pool layer (kernel size = 2) are even number (12 or 14), not odd (13).
Possible solutions to this problem can be either adding padding layer before pooling layer when converting to mxnet, or crop the image in order to match the supposed output shape of the pooling layer.
- Mxnet framework problem is solved by adding padding layer before pooling layer. (2018.5.10)