Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

mx2onnx error about batchnorm #15482

Closed
nopattern opened this issue Jul 7, 2019 · 1 comment
Closed

mx2onnx error about batchnorm #15482

nopattern opened this issue Jul 7, 2019 · 1 comment

Comments

@nopattern
Copy link

nopattern commented Jul 7, 2019

Description

I use mx2onnx onnx_mxnet.export_model to transfer mxnet symbol to onnx . But the moving_mean&moving_var param of Batchnorm is not in the params. So the

Environment info (Required)

----------Python Info----------
Version      : 3.6.8
Compiler     : GCC 5.4.0 20160609
Build        : ('default', 'May  7 2019 14:58:50')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 19.1.1
Directory    : /usr/local/lib/python3.6/dist-packages/pip
----------MXNet Info-----------
Version      : 1.5.0
Directory    : /home/deep/workssd/mxnet/incubator-mxnet/python/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.4.0-148-generic-x86_64-with-Ubuntu-16.04-xenial
system       : Linux
node         : MS-7817
release      : 4.4.0-148-generic
version      : #174-Ubuntu SMP Tue May 7 12:20:14 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 60
Model name:            Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
Stepping:              3
CPU MHz:               3657.070
CPU max MHz:           3700.0000
CPU min MHz:           800.0000
BogoMIPS:              6600.45
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              6144K
NUMA node0 CPU(s):     0-3


Package used (Python/R/Scala/Julia):
(I'm usining Python)

Build info (Required if built from source)

Compiler (gcc):

MXNet commit hash:
(da4b2a82511df)

Build config:

ifndef CC
export CC = gcc
endif
ifndef CXX
export CXX = g++
endif
ifndef NVCC
export NVCC = nvcc
endif

whether compile with options for MXNet developer

DEV = 0

whether compile with debug

DEBUG = 0

whether to turn on segfault signal handler to log the stack trace

USE_SIGNAL_HANDLER =

the additional link flags you want to add

ADD_LDFLAGS =

the additional compile flags you want to add

ADD_CFLAGS =

#---------------------------------------------

matrix computation libraries for CPU/GPU

#---------------------------------------------

whether use CUDA during compile

USE_CUDA = 1

add the path to CUDA library to link and compile flag

if you have already add them to environment variable, leave it as NONE

USE_CUDA_PATH = /usr/local/cuda
#USE_CUDA_PATH = NONE

whether to enable CUDA runtime compilation

ENABLE_CUDA_RTC = 1

whether use CuDNN R3 library

USE_CUDNN = 1

whether to use NVTX when profiling

USE_NVTX = 0

#whether to use NCCL library
USE_NCCL = 0
#add the path to NCCL library
USE_NCCL_PATH = NONE

whether use opencv during compilation

you can disable it, however, you will not able to use

imbin iterator

USE_OPENCV = 1

Add OpenCV include path, in which the directory opencv2 exists

USE_OPENCV_INC_PATH = NONE

Add OpenCV shared library path, in which the shared library exists

USE_OPENCV_LIB_PATH = NONE

#whether use libjpeg-turbo for image decode without OpenCV wrapper
USE_LIBJPEG_TURBO = 0
#add the path to libjpeg-turbo library
USE_LIBJPEG_TURBO_PATH = NONE

use openmp for parallelization

USE_OPENMP = 1

whether use MKL-DNN library: 0 = disabled, 1 = enabled

if USE_MKLDNN is not defined, MKL-DNN will be enabled by default on x86 Linux.

you can disable it explicity with USE_MKLDNN = 0

USE_MKLDNN = 0

whether use NNPACK library

USE_NNPACK = 0

choose the version of blas you want to use

can be: mkl, blas, atlas, openblas

in default use atlas for linux while apple for osx

UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S), Darwin)
USE_BLAS = apple
else
USE_BLAS = atlas
endif

whether use lapack during compilation

only effective when compiled with blas versions openblas/apple/atlas/mkl

USE_LAPACK = 1

path to lapack library in case of a non-standard installation

USE_LAPACK_PATH =

add path to intel library, you may need it for MKL, if you did not add the path

to environment variable

USE_INTEL_PATH = NONE

If use MKL only for BLAS, choose static link automatically to allow python wrapper

ifeq ($(USE_BLAS), mkl)
USE_STATIC_MKL = 1
else
USE_STATIC_MKL = NONE
endif

#----------------------------

Settings for power and arm arch

#----------------------------
ARCH := $(shell uname -a)
ifneq (,$(filter $(ARCH), armv6l armv7l powerpc64le ppc64le aarch64))
USE_SSE=0
USE_F16C=0
else
USE_SSE=1
endif

#----------------------------

F16C instruction support for faster arithmetic of fp16 on CPU

#----------------------------

For distributed training with fp16, this helps even if training on GPUs

If left empty, checks CPU support and turns it on.

For cross compilation, please check support for F16C on target device and turn off if necessary.

USE_F16C =

#----------------------------

distributed computing

#----------------------------

whether or not to enable multi-machine supporting

USE_DIST_KVSTORE = 0

whether or not allow to read and write HDFS directly. If yes, then hadoop is

required

USE_HDFS = 0

path to libjvm.so. required if USE_HDFS=1

LIBJVM=$(JAVA_HOME)/jre/lib/amd64/server

whether or not allow to read and write AWS S3 directly. If yes, then

libcurl4-openssl-dev is required, it can be installed on Ubuntu by

sudo apt-get install -y libcurl4-openssl-dev

USE_S3 = 0

#----------------------------

performance settings

#----------------------------

Use operator tuning

USE_OPERATOR_TUNING = 1

Use gperftools if found

Disable because of #8968

USE_GPERFTOOLS = 0

path to gperftools (tcmalloc) library in case of a non-standard installation

USE_GPERFTOOLS_PATH =

Link gperftools statically

USE_GPERFTOOLS_STATIC =

Use JEMalloc if found, and not using gperftools

USE_JEMALLOC = 1

path to jemalloc library in case of a non-standard installation

USE_JEMALLOC_PATH =

Link jemalloc statically

USE_JEMALLOC_STATIC =

#----------------------------

additional operators

#----------------------------

path to folders containing projects specific operators that you don't want to put in src/operators

EXTRA_OPERATORS =

#----------------------------

other features

#----------------------------

Create C++ interface package

USE_CPP_PACKAGE = 0

Use int64_t type to represent the total number of elements in a tensor

This will cause performance degradation reported in issue #14496

Set to 1 for large tensor with tensor size greater than INT32_MAX i.e. 2147483647

Note: the size of each dimension is still bounded by INT32_MAX

USE_INT64_TENSOR_SIZE = 0

Python executable. Needed for cython target

PYTHON = python

#----------------------------

plugins

#----------------------------

whether to use caffe integration. This requires installing caffe.

You also need to add CAFFE_PATH/build/lib to your LD_LIBRARY_PATH

CAFFE_PATH = $(HOME)/caffe

MXNET_PLUGINS += plugin/caffe/caffe.mk

#WARPCTC_PATH = $(HOME)/warp-ctc
WARPCTC_PATH = /home/deep/warp-ctc
MXNET_PLUGINS += plugin/warpctc/warpctc.mk

whether to use sframe integration. This requires build sframe

git@github.com:dato-code/SFrame.git

SFRAME_PATH = $(HOME)/SFrame

MXNET_PLUGINS += plugin/sframe/plugin.mk

Error Message:

INFO:root:Converting idx: 0, op: null, name: data
INFO:root:Converting idx: 1, op: null, name: first-3x3-conv-conv2d_weight
INFO:root:Converting idx: 2, op: Convolution, name: first-3x3-conv-conv2d
INFO:root:Converting idx: 3, op: null, name: first-3x3-conv-batchnorm_gamma
INFO:root:Converting idx: 4, op: null, name: first-3x3-conv-batchnorm_beta
INFO:root:Converting idx: 5, op: null, name: first-3x3-conv-batchnorm_moving_mean
Traceback (most recent call last):
File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 484, in
tune_and_evaluate(tuning_option)
File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 436, in tune_and_evaluate
net, params, input_shape, _ = get_network(network, batch_size=1)
File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 93, in get_network
return get_network_lpr_mb2(name,batch_size)
File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 143, in get_network_lpr_mb2
test_onnx()
File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 135, in test_onnx
converted_model_path = onnx_mxnet.export_model(mx_sym, args, [input_shape], np.float32, onnx_file, True)
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_model.py", line 87, in export_model
verbose=verbose)
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 256, in create_onnx_graph_proto
idx=idx
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 92, in convert_layer
return convert_func(node, **kwargs)
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/_op_translations.py", line 170, in convert_weights_and_inputs
np_arr = weights[name]
KeyError: 'first-3x3-conv-batchnorm_moving_mean'
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/init.py", line 5, in
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 23, in
import apt
File "/usr/lib/python3/dist-packages/apt/init.py", line 23, in
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 484, in
tune_and_evaluate(tuning_option)
File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 436, in tune_and_evaluate
net, params, input_shape, _ = get_network(network, batch_size=1)
File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 93, in get_network
return get_network_lpr_mb2(name,batch_size)
File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 143, in get_network_lpr_mb2
test_onnx()
File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 135, in test_onnx
converted_model_path = onnx_mxnet.export_model(mx_sym, args, [input_shape], np.float32, onnx_file, True)
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_model.py", line 87, in export_model
verbose=verbose)
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 256, in create_onnx_graph_proto
idx=idx
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 92, in convert_layer
return convert_func(node, **kwargs)
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/_op_translations.py", line 170, in convert_weights_and_inputs
np_arr = weights[name]
KeyError: 'first-3x3-conv-batchnorm_moving_mean'

Minimum reproducible example

`batch_size = 1
input_shape = (batch_size, 3, 512, 512)
output_shape = (batch_size, 65520,14)

mx_sym, args,auxs = mx.model.load_checkpoint('./model/ssd_mobilenetv2_512', 18)
mx_sym = get_symbol('mobilenetv2',512, num_classes=1,nms_thresh=0.5, force_nms=True, nms_topk=400)

onnx_file = './mxnet_exported_resnet18.onnx'
converted_model_path = onnx_mxnet.export_model(mx_sym, args, [input_shape], np.float32, onnx_file, True)`

Steps to reproduce

(Paste the commands you ran that produced the error.)

1.python3 tran2onnx.py
2.

What have you tried to solve it?

1.By debugging ,the moving_mean&moving_var of batchnorm is not in params ,so the converter treat it as input which is not real.
2. There should be code to process the moving_mean&moving_var of batchnorm indepently.

@nopattern
Copy link
Author

sorry, the moving_mean&moving_var is in auxs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant