Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

mxnet profiler.dump error: MXNetError(py_str(_LIB.MXGetLastError())) #15556

Closed
avivna opened this issue Jul 16, 2019 · 4 comments
Closed

mxnet profiler.dump error: MXNetError(py_str(_LIB.MXGetLastError())) #15556

avivna opened this issue Jul 16, 2019 · 4 comments
Labels
Bug Profiler MXNet profiling issues

Comments

@avivna
Copy link

avivna commented Jul 16, 2019

Description

When mxnet.profiler.dump is activated more than once in the same run, the following error is raised:
raise MXNetError(py_str(LIB.MXGetLastError()))
mxnet.base.MXNetError: [06:35:15] /work/mxnet/3rdparty/dmlc-core/include/dmlc/thread_group.h:225: Check failed: auto_remove
== false (1 vs. 0)

For convenience, I have recreated the error in a short script, and ran it on a mxnet/python:1.4.1_cpu_py2 docker.

Environment info (Required)

I have downloaded from the mxnet/python repository the following docker image: mxnet/python:1.4.1_cpu_py2

Results of diagnose.py script:

----------Python Info----------
('Version :', '2.7.12')
('Compiler :', 'GCC 5.4.0 20160609')
('Build :', ('default', 'Nov 12 2018 14:36:49'))
('Arch :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version :', '19.1.1')
('Directory :', '/usr/local/lib/python2.7/dist-packages/pip')
----------MXNet Info-----------
('Version :', '1.4.1')
('Directory :', '/usr/local/lib/python2.7/dist-packages/mxnet')
Hashtag not found. Not installed from pre-built package.
----------System Info----------
('Platform :', 'Linux-4.9.125-linuxkit-x86_64-with-Ubuntu-16.04-xenial')
('system :', 'Linux')
('node :', '9178f021bc9a')
('release :', '4.9.125-linuxkit')
('version :', '#1 SMP Fri Sep 7 08:20:28 UTC 2018')
----------Hardware Info----------
('machine :', 'x86_64')
('processor :', 'x86_64')
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 142
Model name: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz
Stepping: 9
CPU MHz: 2500.000
BogoMIPS: 4933.47
Hypervisor vendor: vertical
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0912 sec, LOAD: 1.1176 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.1000 sec, LOAD: 1.7238 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1142 sec, LOAD: 1.3938 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.1011 sec, LOAD: 0.3863 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1453 sec, LOAD: 1.5163 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1299 sec, LOAD: 1.0787 sec.

Package used (Python/R/Scala/Julia):
I'm using Python package

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio):
GCC 5.4.0 20160609

MXNet version:
('Version :', '1.4.1')
('Directory :', '/usr/local/lib/python2.7/dist-packages/mxnet')

Error Message:

Traceback (most recent call last):
File "mxnet_profiler_dump_example.py", line 32, in
iterative_mxnet_profiling(profiler_dump_folder=profiler_dump_folder, num_iterations_to_profile=3)
File "mxnet_profiler_dump_example.py", line 18, in iterative_mxnet_profiling
mx.profiler.set_config(profile_all=True, aggregate_stats=True, filename=os.path.join(profiler_dump_folder,"profile_iteration_{}.json").format(profiling_iteration))
File "/usr/local/lib/python2.7/dist-packages/mxnet/profiler.py", line 67, in set_config
profiler_kvstore_handle))
File "/usr/local/lib/python2.7/dist-packages/mxnet/base.py", line 252, in check_call
raise MXNetError(py_str(LIB.MXGetLastError()))
mxnet.base.MXNetError: [06:46:49] /work/mxnet/3rdparty/dmlc-core/include/dmlc/thread_group.h:225: Check failed: auto_remove
== false (1 vs. 0)

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x24562a) [0x7f1b8e4c562a]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x245c91) [0x7f1b8e4c5c91]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x32acac9) [0x7f1b9152cac9]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x32b0f23) [0x7f1b91530f23]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x32b1a35) [0x7f1b91531a35]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(MXSetProcessProfilerConfig+0x4ae) [0x7f1b90dc547e]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f1b9a266e40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7f1b9a2668ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7f1b9a4763df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7f1b9a47ad82]

Minimum reproducible example:

Follow the steps in the next section in order to reproduce the described error:

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. pull docker image mxnet/python:1.4.1_cpu_py2 from repository
  2. Activate docker using the following command:
    docker run -e PYTHONUNBUFFERED=0 -it mxnet/python:1.4.1_cpu_py2 /bin/bash
  3. download the attached script: 'mxnet_profiler_dump_error_example.zip' and unzip it.
  4. open a new shell, and copy the script from previous step into the docker container:
    docker cp {download_folder}/mxnet_profiler_dump_error_example.py {CONTAINER_ID}:/mxnet_profiler_dump_error_example.py
  5. Go back to the activated docker container, and activate the script using: python mxnet_profiler_dump_error_example.py

What have you tried to solve it?

  1. Used mxnet.profiler.dumps() instead of Used mxnet.profiler.dump(). Yet, in the mxnet documentation (https://mxnet.incubator.apache.org/versions/master/tutorials/python/profiler.html) it is stated that in order to use the chrome://tracing tool, profiler.dump should be used

example script:
mxnet_profiler_dump_error_example.py.zip

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

@vrakesh
Copy link
Contributor

vrakesh commented Jul 16, 2019

@avivna Thanks for reporting this issue

@mxnet-label-bot add [Profiler, Bug]

@marcoabreu marcoabreu added Bug Profiler MXNet profiling issues labels Jul 16, 2019
@Zha0q1
Copy link
Contributor

Zha0q1 commented Jul 17, 2019

Hi @avivna , calling dump() with no parameter will effectively mark that no more write will be performed. Then if you call set_config() it will error out. What you can do is to set the finished parameter to False , i.e. mx.profiler.dump(finished = False)(https://mxnet.incubator.apache.org/api/python/profiler/profiler.html#mxnet.profiler.dump).

@Zha0q1
Copy link
Contributor

Zha0q1 commented Jul 24, 2019

@sandeep-krishnamurthy this issue seems resolved

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Profiler MXNet profiling issues
Projects
None yet
Development

No branches or pull requests

6 participants