Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Seg. Fault installing cpp-package #11543

Closed
golo314 opened this issue Jul 3, 2018 · 6 comments
Closed

Seg. Fault installing cpp-package #11543

golo314 opened this issue Jul 3, 2018 · 6 comments
Labels

Comments

@golo314
Copy link

golo314 commented Jul 3, 2018

Description

I am attempting to enable cpp-package. I get to 95% built and them upon running OpWrapperGenerator.py, I get a seg. fault.

Environment info (Required)

Fedora 28 on a virtual machine

----------Python Info----------
('Version :', '2.7.15')
('Compiler :', 'GCC 8.1.1 20180502 (Red Hat 8.1.1-1)')
('Build :', ('default', 'May 16 2018 17:50:09'))
('Arch :', ('64bit', ''))
------------Pip Info-----------
('Version :', '10.0.1')
('Directory :', '/usr/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
('Version :', '1.2.0')
('Directory :', '/usr/lib/python2.7/site-packages/mxnet')
('Commit Hash :', '297c64fd2ee404612aa3ecc880b940fb2538039c')
----------System Info----------
('Platform :', 'Linux-4.17.3-200.fc28.x86_64-x86_64-with-fedora-28-Twenty_Eight')
('system :', 'Linux')
('node :', 'localhost.localdomain')
('release :', '4.17.3-200.fc28.x86_64')
('version :', '#1 SMP Tue Jun 26 14:17:07 UTC 2018')
----------Hardware Info----------
('machine :', 'x86_64')
('processor :', 'x86_64')
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 94
Model name: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Stepping: 3
CPU MHz: 4007.996
BogoMIPS: 8015.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase avx2 invpcid rdseed clflushopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0607 sec, LOAD: 0.7921 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0548 sec, LOAD: 0.7807 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.2148 sec, LOAD: 0.9456 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0584 sec, LOAD: 0.3941 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.3320 sec, LOAD: 0.8331 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.4205 sec, LOAD: 1.5194 sec.

Build info

Most recent code from github

Compiler (gcc/clang/mingw/visual studio):
GCC 7.3.0

MXNet commit hash:
552c715

Build config:

Licensed to the Apache Software Foundation (ASF) under one

or more contributor license agreements. See the NOTICE file

distributed with this work for additional information

regarding copyright ownership. The ASF licenses this file

to you under the Apache License, Version 2.0 (the

"License"); you may not use this file except in compliance

with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,

software distributed under the License is distributed on an

"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

KIND, either express or implied. See the License for the

specific language governing permissions and limitations

under the License.

#-------------------------------------------------------------------------------

Template configuration for compiling mxnet

If you want to change the configuration, please use the following

steps. Assume you are on the root directory of mxnet. First copy the this

file so that any local changes will be ignored by git

$ cp make/config.mk .

Next modify the according entries, and then compile by

$ make

or build in parallel with 8 threads

$ make -j8

#-------------------------------------------------------------------------------

#---------------------

choice of compiler

#--------------------

ifndef CC
export CC = gcc
endif
ifndef CXX
export CXX = g++
endif
ifndef NVCC
export NVCC = nvcc
endif

whether compile with options for MXNet developer

DEV = 0

whether compile with debug

DEBUG = 0

whether to turn on segfault signal handler to log the stack trace

USE_SIGNAL_HANDLER =

the additional link flags you want to add

ADD_LDFLAGS =

the additional compile flags you want to add

ADD_CFLAGS =

#---------------------------------------------

matrix computation libraries for CPU/GPU

#---------------------------------------------

whether use CUDA during compile

USE_CUDA = 0

add the path to CUDA library to link and compile flag

if you have already add them to environment variable, leave it as NONE

USE_CUDA_PATH = /usr/local/cuda

USE_CUDA_PATH = NONE

whether to enable CUDA runtime compilation

ENABLE_CUDA_RTC = 1

whether use CuDNN R3 library

USE_CUDNN = 0

#whether to use NCCL library
USE_NCCL = 0
#add the path to NCCL library
USE_NCCL_PATH = NONE

whether use opencv during compilation

you can disable it, however, you will not able to use

imbin iterator

USE_OPENCV = 1

#whether use libjpeg-turbo for image decode without OpenCV wrapper
USE_LIBJPEG_TURBO = 0
#add the path to libjpeg-turbo library
USE_LIBJPEG_TURBO_PATH = NONE

use openmp for parallelization

USE_OPENMP = 1

whether use MKL-DNN library

USE_MKLDNN = 0

whether use NNPACK library

USE_NNPACK = 0

choose the version of blas you want to use

can be: mkl, blas, atlas, openblas

in default use atlas for linux while apple for osx

UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S), Darwin)
USE_BLAS = apple
else
USE_BLAS = atlas
endif

whether use lapack during compilation

only effective when compiled with blas versions openblas/apple/atlas/mkl

USE_LAPACK = 1

path to lapack library in case of a non-standard installation

USE_LAPACK_PATH =

add path to intel library, you may need it for MKL, if you did not add the path

to environment variable

USE_INTEL_PATH = NONE

If use MKL only for BLAS, choose static link automatically to allow python wrapper

ifeq ($(USE_BLAS), mkl)
USE_STATIC_MKL = 1
else
USE_STATIC_MKL = NONE
endif

#----------------------------

Settings for power and arm arch

#----------------------------
ARCH := $(shell uname -a)
ifneq (,$(filter $(ARCH), armv6l armv7l powerpc64le ppc64le aarch64))
USE_SSE=0
USE_F16C=0
else
USE_SSE=1
endif

#----------------------------

F16C instruction support for faster arithmetic of fp16 on CPU

#----------------------------

For distributed training with fp16, this helps even if training on GPUs

If left empty, checks CPU support and turns it on.

For cross compilation, please check support for F16C on target device and turn off if necessary.

USE_F16C =

#----------------------------

distributed computing

#----------------------------

whether or not to enable multi-machine supporting

USE_DIST_KVSTORE = 0

whether or not allow to read and write HDFS directly. If yes, then hadoop is

required

USE_HDFS = 0

path to libjvm.so. required if USE_HDFS=1

LIBJVM=$(JAVA_HOME)/jre/lib/amd64/server

whether or not allow to read and write AWS S3 directly. If yes, then

libcurl4-openssl-dev is required, it can be installed on Ubuntu by

sudo apt-get install -y libcurl4-openssl-dev

USE_S3 = 0

#----------------------------

performance settings

#----------------------------

Use operator tuning

USE_OPERATOR_TUNING = 1

Use gperftools if found

USE_GPERFTOOLS = 1

Use JEMalloc if found, and not using gperftools

USE_JEMALLOC = 1

#----------------------------

additional operators

#----------------------------

path to folders containing projects specific operators that you don't want to put in src/operators

EXTRA_OPERATORS =

#----------------------------

other features

#----------------------------

Create C++ interface package

USE_CPP_PACKAGE = 0

#----------------------------

plugins

#----------------------------

whether to use caffe integration. This requires installing caffe.

You also need to add CAFFE_PATH/build/lib to your LD_LIBRARY_PATH

CAFFE_PATH = $(HOME)/caffe

MXNET_PLUGINS += plugin/caffe/caffe.mk

WARPCTC_PATH = $(HOME)/warp-ctc

MXNET_PLUGINS += plugin/warpctc/warpctc.mk

whether to use sframe integration. This requires build sframe

git@github.com:dato-code/SFrame.git

SFRAME_PATH = $(HOME)/SFrame

MXNET_PLUGINS += plugin/sframe/plugin.mk

Error Message:

make[2]: *** [cpp-package/CMakeFiles/cpp_package_op_h.dir/build.make:59: cpp-package/CMakeFiles/cpp_package_op_h] Segmentation fault (core dumped)
make[1]: *** [CMakeFiles/Makefile2:1594: cpp-package/CMakeFiles/cpp_package_op_h.dir/all] Error 2
make: *** [Makefile:141: all] Error 2

Steps to reproduce

  1. ccmake . // I turned on USE_CPP_PACKAGE and turned off USE_CUDA, USE_CUDNN, USE_LAPACK, MKL_VERBOSE
  2. make -j$(nproc)

What have you tried to solve it?

  1. I have tried playing around with the OpWrapperGenerotr.py to figure out what the problem is
@golo314
Copy link
Author

golo314 commented Jul 6, 2018

The error happens inside of libmxnet.so.
After running the debugger, here is a quick snippet where it goes wrong:
#0 0x00007fffe65391eb in alloc_mmap () from /home/alex_golo/mxnet/build/libmxnet.so

#1 0x00007fffe6539506 in blas_memory_alloc () from /home/alex_golo/mxnet/build/libmxnet.so

#2 0x00007fffe65398f7 in gotoblas_memory_init () from /home/alex_golo/mxnet/build/libmxnet.so

#3 0x00007fffe6539966 in gotoblas_init () from /home/alex_golo/mxnet/build/libmxnet.so

#4 0x00007ffff7de551a in call_init.part () from /lib64/ld-linux-x86-64.so.2

#5 0x00007ffff7de5616 in _dl_init () from /lib64/ld-linux-x86-64.so.2

#6 0x00007ffff7de97bf in dl_open_worker () from /lib64/ld-linux-x86-64.so.2

#7 0x00007ffff6d40baf in _dl_catch_exception () from /lib64/libc.so.6

@golo314
Copy link
Author

golo314 commented Jul 6, 2018

Another thing I noticed, I get multiple cmake warnings most of which are concerned with cpp-package examples. Here is just one example of several:

CMake Warning at cpp-package/example/CMakeLists.txt:54 (add_executable):
Cannot generate a safe runtime search path for target mlp because there is
a cycle in the constraint graph:

dir 0 is [/home/al/mxnet/build/3rdparty/mkldnn/src]
dir 1 is [/usr/local/lib]
  dir 0 must precede it due to runtime library [libmkldnn.so.0]
  dir 2 must precede it due to runtime library [libomp.so]
  dir 3 must precede it due to runtime library [libiomp5.so]
dir 2 is [/home/al/mxnet/build/3rdparty/openmp/runtime/src]
  dir 3 must precede it due to runtime library [libiomp5.so]
dir 3 is [/home/al/mxnet/build/mklml/mklml_lnx_2018.0.3.20180406/lib]

Some of these libraries may not be found correctly.

@andrewfayres
Copy link
Contributor

Thank you for submitting the issue! @sandeep-krishnamurthy requesting this be labeled as c++ and installation

@golo314
Copy link
Author

golo314 commented Jul 9, 2018

The warning about not finding a safe path goes away when I turn off OpenMP.

@golo314
Copy link
Author

golo314 commented Jul 10, 2018

Might be worth mentioning, when I do not use CPP-PACKAGE (have it off) code compiles without a problem.

@golo314 golo314 closed this as completed Jul 11, 2018
@golo314
Copy link
Author

golo314 commented Jul 11, 2018

The problem was with OpenBLAS. I installed ATLAS and was able to build.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants