Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

mxnet::cpp::NDArray::WaitAll() take about 160ms on gtx1080ti #13245

Open
bigtiger90 opened this issue Nov 13, 2018 · 5 comments
Open

mxnet::cpp::NDArray::WaitAll() take about 160ms on gtx1080ti #13245

bigtiger90 opened this issue Nov 13, 2018 · 5 comments
Labels
C++ Related to C++ Performance

Comments

@bigtiger90
Copy link

here are codes:
tValRestart;
net271_executor->Forward(false);
std::cout << "Forward use " << tValDuration << " ms" << std::endl;
tValRestart;
auto targetx = net271_executor->outputs[0].Copy(global_cpu_ctx);
auto targety = net271_executor->outputs[1].Copy(global_cpu_ctx);
auto targetw = net271_executor->outputs[2].Copy(global_cpu_ctx);
auto targeth = net271_executor->outputs[3].Copy(global_cpu_ctx);
auto softmax_score = net271_executor->outputs[4].Copy(global_cpu_ctx);
auto penalty_t = net271_executor->outputs[5].Copy(global_cpu_ctx);
std::cout << "copy cpu use " << tValDuration << " ms" << std::endl;
tValRestart;
mxnet::cpp::NDArray::WaitAll();
std::cout << "waitall use " << tValDuration << " ms" << std::endl;

Is that normal? 6 outputs are float-type data
@bigtiger90 bigtiger90 changed the title mxnet::cpp::NDArray::WaitAll() take about 160ms mxnet::cpp::NDArray::WaitAll() take about 160ms on gtx1080ti Nov 13, 2018
@zachgk
Copy link
Contributor

zachgk commented Nov 13, 2018

@mxnet-label-bot add [C++, Performance, Pending Requester Info]

Hi @aaron900813, can you help provide some info about your environment?

Environment info (Required)

What to do:
1. Download the diagnosis script from https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
2. Run the script using `python diagnose.py` and paste its output here.

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio):

MXNet commit hash:
(Paste the output of git rev-parse HEAD here.)

Build config:
(Paste the content of config.mk, or the build command.)

@bigtiger90
Copy link
Author

bigtiger90 commented Nov 14, 2018

hi @marcoabreu
here are diagnose ,build info and configuration, nvidia-smi and btw cuda version is 8.0

----------Python Info----------
('Version :', '2.7.6')
('Compiler :', 'GCC 4.8.4')
('Build :', ('default', 'Nov 23 2017 15:49:48'))
('Arch :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version :', '1.5.4')
('Directory :', '/usr/lib/python2.7/dist-packages/pip')
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
('Platform :', 'Linux-4.4.0-138-generic-x86_64-with-Ubuntu-14.04-trusty')
('system :', 'Linux')
('node :', 'user-ubuntu')
('release :', '4.4.0-138-generic')
('version :', '#164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018')
----------Hardware Info----------
('machine :', 'x86_64')
('processor :', 'x86_64')
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Stepping: 9
CPU MHz: 2426.484
BogoMIPS: 8400.75
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0001 sec, LOAD: 1.7462 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0267 sec, LOAD: 4.3220 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0273 sec, LOAD: 1.2523 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0310 sec, LOAD: 0.8786 sec.
Error open Gluon Tutorial(en): http://gluon.mxnet.io, <urlopen error [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>, DNS finished in 0.0259628295898 sec.
Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>, DNS finished in 3.87871289253 sec.

git rev-parse HEAD
9ccd647

CMakeLists.txt configuration
mxnet_option(USE_CUDA "Build with CUDA support" ON)
mxnet_option(USE_OLDCMAKECUDA "Build with old cmake cuda" OFF)
mxnet_option(USE_NCCL "Use NVidia NCCL with CUDA" OFF)
mxnet_option(USE_OPENCV "Build with OpenCV support" ON)
mxnet_option(USE_OPENMP "Build with Openmp support" ON)
mxnet_option(USE_CUDNN "Build with cudnn support" ON) # one could set CUDNN_ROOT for search path
mxnet_option(USE_SSE "Build with x86 SSE instruction support" ON)
mxnet_option(USE_LAPACK "Build with lapack support" ON IF NOT MSVC)
mxnet_option(USE_MKL_IF_AVAILABLE "Use MKL if found" ON)
mxnet_option(USE_MKLML_MKL "Use MKLDNN variant of MKL (if MKL found)" ON IF USE_MKL_IF_AVAILABLE AND UNIX AND (NOT APPLE))
mxnet_option(USE_MKLDNN "Use MKLDNN variant of MKL (if MKL found)" ON IF USE_MKL_IF_AVAILABLE AND UNIX AND (NOT APPLE))
mxnet_option(USE_OPERATOR_TUNING "Enable auto-tuning of operators" ON IF NOT MSVC)
mxnet_option(USE_GPERFTOOLS "Build with GPerfTools support (if found)" ON)
mxnet_option(USE_JEMALLOC "Build with Jemalloc support" ON)
mxnet_option(USE_PROFILER "Build with Profiler support" ON)
mxnet_option(USE_DIST_KVSTORE "Build with DIST_KVSTORE support" OFF)
mxnet_option(USE_PLUGINS_WARPCTC "Use WARPCTC Plugins" OFF)
mxnet_option(USE_PLUGIN_CAFFE "Use Caffe Plugin" OFF)
mxnet_option(USE_CPP_PACKAGE "Build C++ Package" ON)
mxnet_option(USE_MXNET_LIB_NAMING "Use MXNet library naming conventions." ON)
mxnet_option(USE_GPROF "Compile with gprof (profiling) flag" OFF)
mxnet_option(USE_CXX14_IF_AVAILABLE "Build with C++14 if the compiler supports it" OFF)
mxnet_option(USE_VTUNE "Enable use of Intel Amplifier XE (VTune)" OFF) # one could set VTUNE_ROOT for search path
mxnet_option(ENABLE_CUDA_RTC "Build with CUDA runtime compilation support" ON)
mxnet_option(BUILD_CPP_EXAMPLES "Build cpp examples" ON)
mxnet_option(INSTALL_EXAMPLES "Install the example source files." OFF)
mxnet_option(USE_SIGNAL_HANDLER "Print stack traces on segfaults." OFF)

nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87 Driver Version: 390.87 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:01:00.0 On | N/A |
| 25% 46C P0 59W / 250W | 129MiB / 11177MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A |
| 29% 28C P8 8W / 250W | 2MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1482 G /usr/lib/xorg/Xorg 127MiB |
+-----------------------------------------------------------------------------+

update:
If I compile mxnet with CMake, it will appear at the WaitAll function block for a long time, and not if i use Makefile compilation.

cmake:
cd incubator-mxnet
mkdir build
cd build
cmake ..
make -j10
make install

is there anything wrong?
make:
make -j16 USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1 USE_CPP_PACKAGE=1
cp lib/libmxnet.so /usr/local/lib/

@chinakook
Copy link
Contributor

chinakook commented Nov 16, 2018

Yes, building with CMake and building with Makefile are different. The former will be sometimes slow and stuck on Linux.

@zachgk
Copy link
Contributor

zachgk commented Nov 16, 2018

@mxnet-label-bot remove [Pending Requester Info]

@bigtiger90
Copy link
Author

Yes, building with CMake and building with Makefile are different. The former will be sometimes slow and stuck on Linux.
it's really strange, should not be like this, do you know how to fix it?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C++ Related to C++ Performance
Projects
None yet
Development

No branches or pull requests

4 participants