Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

ValueError: The argument structure of HybridBlock does not match the cached version. Stored format = [0], input format = [0, 0, 0] #19635

Closed
edsn60 opened this issue Dec 7, 2020 · 2 comments
Labels

Comments

@edsn60
Copy link

edsn60 commented Dec 7, 2020

Description

I was trying to train maskrcnn using mxnet and gluoncv under cpu with the script "train_mask_rcnn.py" provided by gluoncv (see https://cv.gluon.ai/build/examples_instance/train_mask_rcnn_coco.html). The train script does not raise any error, however, when I tried to load my pretrained model and test an image, I got this error. I don't know what's happening.

Error Message

Traceback (most recent call last):
File "/home/shelvin_yuan/Desktop/gluoncv_test/pre_mask_rcnn.py", line 88, in
ids, scores, bboxes, masks = [xx[0].asnumpy() for xx in net(x)]
File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py", line 683, in call
out = self.forward(*args)
File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py", line 1430, in forward
return self._call_cached_op(x, *args)
File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py", line 1022, in _call_cached_op
raise ValueError("The argument structure of HybridBlock does not match"
ValueError: The argument structure of HybridBlock does not match the cached version. Stored format = [0], input format = [0, 0, 0]

To Reproduce

This is "pre_mask_rcnn.py":
from matplotlib import pyplot as plt
from gluoncv import model_zoo, data, utils
from mxnet import gluon
import mxnet as mx

net = gluon.SymbolBlock.imports('./mask_rcnn_resnet50_v1b_coco_0000_0.0000-symbol.json', ['data0', 'data1', 'data2'], './mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params', ctx=mx.cpu()) # this is where the error happens

in train_mask_rcnn.py, I used the following two lines to save the model and parameters.
net.save_parameters('{:s}{:04d}{:.4f}.params'.format(prefix, epoch, current_map))
net.export('{:s}{:04d}{:.4f}'.format(prefix, epoch, current_map), epoch=0)

Steps to reproduce

Run the "pre_mask_rcnn.py" in pycharm

What have you tried to solve it?

Load from model_zoo

I tried to use another way to load model from model_zoo with my pretrained parameters:

param = "./mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params"
net = model_zoo.get_model('mask_rcnn_resnet50_v1b_coco', pretrained=False)
net.initialize(ctx=mx.cpu())
net.reset_class(['insulator'])
net.load_parameters(param.strip())

but I got another error in "net.load_parameters(param.strip())"
Traceback (most recent call last):
File "/home/shelvin_yuan/Desktop/gluoncv_test/pre_mask_rcnn.py", line 35, in
net.load_parameters(param.strip())
File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py", line 530, in load_parameters
self.collect_params().load(
File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/parameter.py", line 1022, in load
self.load_dict(ndarray_load, ctx, allow_missing,
File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/parameter.py", line 1055, in load_dict
assert name in arg_dict,
AssertionError: Parameter 'conv0_weight' is missing in file: ./mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params, which contains parameters: 'maskrcnn0_resnetv1b_conv0_weight', 'maskrcnn0_resnetv1b_batchnorm0_gamma', 'maskrcnn0_resnetv1b_batchnorm0_beta', ..., 'maskrcnn0_maskrcnn0_mask0_conv0_weight', 'maskrcnn0_maskrcnn0_mask0_conv0_bias', 'maskrcnn0_maskrcnn0_mask0_conv1_weight', 'maskrcnn0_maskrcnn0_mask0_conv1_bias'. Please make sure source and target networks have the same prefix.For more info on naming, please see https://mxnet.io/api/python/docs/tutorials/packages/gluon/blocks/naming.html

It seems that the name of parameters in the loaded model do not have prefix ""maskrcnn0_resnetv1b_", which appears in my saved parameters. I went back to "train_mask_rcnn.py" and found an argument "save_prefix", but it affects the file name of ".param" and ".json" instead of the parameters themselves.

By the way, in "pre_mask_rcnn.py", if I change
net = gluon.SymbolBlock.imports('./mask_rcnn_resnet50_v1b_coco_0000_0.0000-symbol.json', ['data0', 'data1', 'data2'], './mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params', ctx=mx.cpu())

into
net = gluon.SymbolBlock.imports('./mask_rcnn_resnet50_v1b_coco_0000_0.0000-symbol.json', ['data0'], './mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params', ctx=mx.cpu())

then I got another error:
Traceback (most recent call last):
File "/home/shelvin_yuan/Desktop/gluoncv_test/pre_mask_rcnn.py", line 37, in
net = gluon.SymbolBlock.imports('./mask_rcnn_resnet50_v1b_coco_0000_0.0000-symbol.json', ['data0'], './mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params', ctx=mx.cpu())
File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py", line 1366, in imports
ret.collect_params().load(param_file, ctx=ctx, cast_dtype=True, dtype_source='saved')
File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/parameter.py", line 1022, in load
self.load_dict(ndarray_load, ctx, allow_missing,
File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/parameter.py", line 1055, in load_dict
assert name in arg_dict,
AssertionError: Parameter 'data1' is missing in file: ./mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params, which contains parameters: 'resnetv1b_conv0_weight', 'resnetv1b_batchnorm0_gamma', 'resnetv1b_batchnorm0_beta', ..., 'maskrcnn0_mask0_conv0_weight', 'maskrcnn0_mask0_conv0_bias', 'maskrcnn0_mask0_conv1_weight', 'maskrcnn0_mask0_conv1_bias'. Please make sure source and target networks have the same prefix.For more info on naming, please see https://mxnet.io/api/python/docs/tutorials/packages/gluon/blocks/naming.html

Environment

----------Python Info----------
Version : 3.8.5
Compiler : GCC 7.3.0
Build : ('default', 'Sep 4 2020 07:30:14')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
Version : 20.2.4
Directory : /home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/pip
----------MXNet Info-----------
Version : 1.7.0
Directory : /home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet
Commit Hash : 64f737c
64f737c
64f737c
64f737c
64f737c
64f737c
64f737c
64f737c
64f737c
64f737c
Library : ['/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/libmxnet.so']
Build features:
✖ CUDA
✖ CUDNN
✖ NCCL
✖ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✔ CPU_SSE4_2
✖ CPU_SSE4A
✔ CPU_AVX
✖ CPU_AVX2
✔ OPENMP
✖ SSE
✔ F16C
✖ JEMALLOC
✔ BLAS_OPEN
✖ BLAS_ATLAS
✖ BLAS_MKL
✖ BLAS_APPLE
✔ LAPACK
✔ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✔ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✖ DEBUG
✖ TVM_OP
----------System Info----------
Platform : Linux-5.4.0-56-generic-x86_64-with-glibc2.10
system : Linux
node : s-ubuntu
release : 5.4.0-56-generic
version : #62~18.04.1-Ubuntu SMP Tue Nov 24 10:07:50 UTC 2020
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
Stepping: 13
CPU MHz: 906.093
CPU max MHz: 4700.0000
CPU min MHz: 800.0000
BogoMIPS: 6000.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0181 sec, LOAD: 16.4462 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.8030 sec, LOAD: 3.1562 sec.
Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1123)>, DNS finished in 0.39185047149658203 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.8720 sec, LOAD: 2.1558 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0062 sec, LOAD: 6.1062 sec.
Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.5848102569580078 sec.
----------Environment----------

@github-actions
Copy link

github-actions bot commented Dec 7, 2020

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue.
Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly.
If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

@szha
Copy link
Member

szha commented Dec 7, 2020

To use with imports interface, you need to use block.export to save both the graph and the parameters.

@szha szha removed the needs triage label Feb 8, 2021
@szha szha closed this as completed Feb 8, 2021
@apache apache locked and limited conversation to collaborators Feb 8, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

2 participants