You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
I would like to activate the mxnet profiler in order to profile a deep learning training job. Yet, when mxnet.profiler.dumps is activated more than once during the training job (each time dumping the output into a different file) the output json files have invalid json format.
Assuming, for example, that 3 files are created during the run using the profiler.dumps command:
[1] The first file will miss the following suffix:
pull docker image mxnet/python:1.4.1_cpu_py2 from repository
Activate docker using the following command:
docker run -e PYTHONUNBUFFERED=0 -it mxnet/python:1.4.1_cpu_py2 /bin/bash
download the attached script: 'mxnet_profiler_dumps_invalid_json_example.py.py.zip' and unzip it.
open a new shell, and copy the script from previous step into the docker container:
docker cp {download_folder}/mxnet_profiler_dumps_invalid_json_example.py.py {CONTAINER_ID}:/mxnet_profiler_dumps_invalid_json_example.py
Go back to the activated docker container, and activate the script using: python mxnet_profiler_dumps_invalid_json_example.py
install vim on docker:
apt-get update
apt-get install vim
use vim to view missing prefix/suffix in the following files:
/tmp/profiler_dump/profile_iteration_0.json
/tmp/profiler_dump/profile_iteration_1.json
/tmp/profiler_dump/profile_iteration_2.json
What have you tried to solve it?
When only a single file is dumped, profiler yields valid json format files.
Hi @avivna , thank you for creating this issue. The designed usage is that you can only dump to a single file in you script because we are incrementally writing to the file. Changing the file path will result in incomplete outputs. With that said, set_config() should be called only once in your model and that is before you model runs
Description
I would like to activate the mxnet profiler in order to profile a deep learning training job. Yet, when mxnet.profiler.dumps is activated more than once during the training job (each time dumping the output into a different file) the output json files have invalid json format.
Assuming, for example, that 3 files are created during the run using the profiler.dumps command:
[1] The first file will miss the following suffix:
],
"displayTimeUnit": "ms"
}
[2] The last file will miss the following suffix:
{
"traceEvents": [
{
"ph": "M",
"args": {
"name": "cpu/0"
},
"pid": 0,
"name": "process_name"
},
{
"ph": "M",
"args": {
"name": "cpu/1"
},
"pid": 1,
"name": "process_name"
},
{
"ph": "M",
"args": {
"name": "cpu pinned/"
},
"pid": 2,
"name": "process_name"
},
{
"ph": "M",
"args": {
"name": "cpu shared/"
},
"pid": 3,
"name": "process_name"
},
[3] The second file will miss both suffix and prefix described in [1] and [2]
Environment info (Required)
I have downloaded from the mxnet/python repository the following docker image: mxnet/python:1.4.1_cpu_py2
Results of diagnose.py script:
('Version :', '2.7.12')
('Compiler :', 'GCC 5.4.0 20160609')
('Build :', ('default', 'Nov 12 2018 14:36:49'))
('Arch :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version :', '19.1.1')
('Directory :', '/usr/local/lib/python2.7/dist-packages/pip')
----------MXNet Info-----------
('Version :', '1.4.1')
('Directory :', '/usr/local/lib/python2.7/dist-packages/mxnet')
Hashtag not found. Not installed from pre-built package.
----------System Info----------
('Platform :', 'Linux-4.9.125-linuxkit-x86_64-with-Ubuntu-16.04-xenial')
('system :', 'Linux')
('node :', '9178f021bc9a')
('release :', '4.9.125-linuxkit')
('version :', '#1 SMP Fri Sep 7 08:20:28 UTC 2018')
----------Hardware Info----------
('machine :', 'x86_64')
('processor :', 'x86_64')
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 142
Model name: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz
Stepping: 9
CPU MHz: 2500.000
BogoMIPS: 4933.47
Hypervisor vendor: vertical
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0900 sec, LOAD: 0.8678 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0894 sec, LOAD: 1.7616 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1251 sec, LOAD: 1.4927 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0941 sec, LOAD: 0.3832 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0892 sec, LOAD: 1.3198 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1272 sec, LOAD: 1.1022 sec.
Package used (Python/R/Scala/Julia):
I'm using Python package
Build info (Required if built from source)
Compiler (gcc/clang/mingw/visual studio):
GCC 5.4.0 20160609
MXNet version:
('Version :', '1.4.1')
('Directory :', '/usr/local/lib/python2.7/dist-packages/mxnet')
Build config:
(Paste the content of config.mk, or the build command.)
Minimum reproducible example
Download the following script and follow the steps described in the next section in order to reproduce this bug:
mxnet_profiler_dumps_invalid_json_example.py.zip
Steps to reproduce
docker run -e PYTHONUNBUFFERED=0 -it mxnet/python:1.4.1_cpu_py2 /bin/bash
docker cp {download_folder}/mxnet_profiler_dumps_invalid_json_example.py.py {CONTAINER_ID}:/mxnet_profiler_dumps_invalid_json_example.py
apt-get update
apt-get install vim
/tmp/profiler_dump/profile_iteration_0.json
/tmp/profiler_dump/profile_iteration_1.json
/tmp/profiler_dump/profile_iteration_2.json
What have you tried to solve it?
mxnet profiler.dump error: MXNetError(py_str(_LIB.MXGetLastError())) #15556
The text was updated successfully, but these errors were encountered: