Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

mxnet.profiler.dump creates invalid json files #15557

Closed
avivna opened this issue Jul 16, 2019 · 4 comments
Closed

mxnet.profiler.dump creates invalid json files #15557

avivna opened this issue Jul 16, 2019 · 4 comments
Labels
Bug Profiler MXNet profiling issues

Comments

@avivna
Copy link

avivna commented Jul 16, 2019

Description

I would like to activate the mxnet profiler in order to profile a deep learning training job. Yet, when mxnet.profiler.dumps is activated more than once during the training job (each time dumping the output into a different file) the output json files have invalid json format.

Assuming, for example, that 3 files are created during the run using the profiler.dumps command:

[1] The first file will miss the following suffix:

],
"displayTimeUnit": "ms"
}

[2] The last file will miss the following suffix:

{
"traceEvents": [
{
"ph": "M",
"args": {
"name": "cpu/0"
},
"pid": 0,
"name": "process_name"
},
{
"ph": "M",
"args": {
"name": "cpu/1"
},
"pid": 1,
"name": "process_name"
},
{
"ph": "M",
"args": {
"name": "cpu pinned/"
},
"pid": 2,
"name": "process_name"
},
{
"ph": "M",
"args": {
"name": "cpu shared/"
},
"pid": 3,
"name": "process_name"
},

[3] The second file will miss both suffix and prefix described in [1] and [2]

Environment info (Required)

I have downloaded from the mxnet/python repository the following docker image: mxnet/python:1.4.1_cpu_py2

Results of diagnose.py script:

('Version :', '2.7.12')
('Compiler :', 'GCC 5.4.0 20160609')
('Build :', ('default', 'Nov 12 2018 14:36:49'))
('Arch :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version :', '19.1.1')
('Directory :', '/usr/local/lib/python2.7/dist-packages/pip')
----------MXNet Info-----------
('Version :', '1.4.1')
('Directory :', '/usr/local/lib/python2.7/dist-packages/mxnet')
Hashtag not found. Not installed from pre-built package.
----------System Info----------
('Platform :', 'Linux-4.9.125-linuxkit-x86_64-with-Ubuntu-16.04-xenial')
('system :', 'Linux')
('node :', '9178f021bc9a')
('release :', '4.9.125-linuxkit')
('version :', '#1 SMP Fri Sep 7 08:20:28 UTC 2018')
----------Hardware Info----------
('machine :', 'x86_64')
('processor :', 'x86_64')
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 142
Model name: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz
Stepping: 9
CPU MHz: 2500.000
BogoMIPS: 4933.47
Hypervisor vendor: vertical
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0900 sec, LOAD: 0.8678 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0894 sec, LOAD: 1.7616 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1251 sec, LOAD: 1.4927 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0941 sec, LOAD: 0.3832 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0892 sec, LOAD: 1.3198 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1272 sec, LOAD: 1.1022 sec.

Package used (Python/R/Scala/Julia):
I'm using Python package

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio):
GCC 5.4.0 20160609

MXNet version:
('Version :', '1.4.1')
('Directory :', '/usr/local/lib/python2.7/dist-packages/mxnet')

Build config:
(Paste the content of config.mk, or the build command.)

Minimum reproducible example

Download the following script and follow the steps described in the next section in order to reproduce this bug:

mxnet_profiler_dumps_invalid_json_example.py.zip

Steps to reproduce

  1. pull docker image mxnet/python:1.4.1_cpu_py2 from repository
  2. Activate docker using the following command:
    docker run -e PYTHONUNBUFFERED=0 -it mxnet/python:1.4.1_cpu_py2 /bin/bash
  3. download the attached script: 'mxnet_profiler_dumps_invalid_json_example.py.py.zip' and unzip it.
  4. open a new shell, and copy the script from previous step into the docker container:
    docker cp {download_folder}/mxnet_profiler_dumps_invalid_json_example.py.py {CONTAINER_ID}:/mxnet_profiler_dumps_invalid_json_example.py
  5. Go back to the activated docker container, and activate the script using: python mxnet_profiler_dumps_invalid_json_example.py
  6. install vim on docker:
    apt-get update
    apt-get install vim
  7. use vim to view missing prefix/suffix in the following files:
    /tmp/profiler_dump/profile_iteration_0.json
    /tmp/profiler_dump/profile_iteration_1.json
    /tmp/profiler_dump/profile_iteration_2.json

What have you tried to solve it?

  1. When only a single file is dumped, profiler yields valid json format files.
  2. Tried to use mxnet.profiler.dump() instead. Yet this created a bug described in:
    mxnet profiler.dump error: MXNetError(py_str(_LIB.MXGetLastError())) #15556
@vrakesh
Copy link
Contributor

vrakesh commented Jul 16, 2019

@mxnet-label-bot add [Profiler]

@marcoabreu marcoabreu added the Profiler MXNet profiling issues label Jul 16, 2019
@Zha0q1
Copy link
Contributor

Zha0q1 commented Jul 17, 2019

Hi @avivna , thank you for creating this issue. The designed usage is that you can only dump to a single file in you script because we are incrementally writing to the file. Changing the file path will result in incomplete outputs. With that said, set_config() should be called only once in your model and that is before you model runs

@Zha0q1
Copy link
Contributor

Zha0q1 commented Jul 17, 2019

We are also adding a new section to the the profiler tutorial to explicitly point out those rules.

@Zha0q1
Copy link
Contributor

Zha0q1 commented Jul 24, 2019

@sandeep-krishnamurthy this issue seems resolved

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Profiler MXNet profiling issues
Projects
None yet
Development

No branches or pull requests

5 participants