Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

C++-Package examples broken #17514

Closed
olk opened this issue Feb 4, 2020 · 15 comments
Closed

C++-Package examples broken #17514

olk opened this issue Feb 4, 2020 · 15 comments
Assignees
Labels
Bug C++ Related to C++

Comments

@olk
Copy link
Contributor

olk commented Feb 4, 2020

Description

Compiling clean checkout from github causes multiple undefined reference linker errors.
This kind of errors usually happen if compilation units are linked together that have been compiled with different C++ standards.

Error Message

/usr/bin/ld: libmxnet.a(c_api.cc.o): in function MXNDArrayReshape': c_api.cc:(.text+0x56a7): undefined reference to std::__cxx11::basic_ostringstream<

To Reproduce

git clone --recursive https://github.com/olk/incubator-mxnet/
cd incubator-mxnet/
mkdir build && cd build
cmake -DUSE_NCCL=1 -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=Release -GNinja ..
ninja -v -j 32

What have you tried to solve it?

The issue was introduced by commit:
230ceee Switch to modern CMake CUDA handling (#17031)

The sources can be successfully compiled and linked with ffeebe4 fix int8 add ut (#17166) as HEAD (== the commit before 230ceee).

Environment

----------Python Info----------
Version      : 3.8.1
Compiler     : GCC 9.2.0
Build        : ('default', 'Jan 22 2020 06:38:00')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 19.3
Directory    : /usr/lib/python3.8/site-packages/pip
----------MXNet Info-----------
Version      : 1.5.1
Directory    : /home/graemer/Projekte/MXNet/apache-mxnet-src-1.5.1-incubating/python/mxnet
Num GPUs     : 2
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-5.5.1-arch1-1-x86_64-with-glibc2.2.5
system       : Linux
node         : e5lx
release      : 5.5.1-arch1-1
version      : #1 SMP PREEMPT Sat, 01 Feb 2020 16:38:40 +0000
----------Hardware Info----------
machine      : x86_64
processor    : 
Architektur:                     x86_64
CPU Operationsmodus:             32-bit, 64-bit
Byte-Reihenfolge:                Little Endian
Adressgrößen:                    46 bits physical, 48 bits virtual
CPU(s):                          32
Liste der Online-CPU(s):         0-31
Thread(s) pro Kern:              2
Kern(e) pro Sockel:              8
Sockel:                          2
NUMA-Knoten:                     2
Anbieterkennung:                 GenuineIntel
Prozessorfamilie:                6
Modell:                          79
Modellname:                      Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
Stepping:                        1
CPU MHz:                         2260.712
Maximale Taktfrequenz der CPU:   3000,0000
Minimale Taktfrequenz der CPU:   1200,0000
BogoMIPS:                        4192.56
Virtualisierung:                 VT-x
L1d Cache:                       512 KiB
L1i Cache:                       512 KiB
L2 Cache:                        4 MiB
L3 Cache:                        40 MiB
NUMA-Knoten0 CPU(s):             0-7,16-23
NUMA-Knoten1 CPU(s):             8-15,24-31
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT vulnerable
Markierungen:                    fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe sysca
                                 ll nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
                                  dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_
                                 timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ib
                                 rs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm r
                                 dt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flu
                                 sh_l1d
@olk olk added the Bug label Feb 4, 2020
@olk olk changed the title comping from sources: ABI mismatch - linker error comping from sources: C++ ABI mismatch - linker error Feb 4, 2020
@olk olk changed the title comping from sources: C++ ABI mismatch - linker error comping from sources: C++ ABI mismatch (Switch to modern CMake CUDA handling) Feb 4, 2020
@leezu

This comment has been minimized.

@leezu
Copy link
Contributor

leezu commented Feb 4, 2020

I can't reproduce your error, but I do observe

../tests/../cpp-package/include/mxnet-cpp/optimizer.hpp:37:10: fatal error: mxnet-cpp/op.h: No such file or directory
 #include "mxnet-cpp/op.h"
          ^~~~~~~~~~~~~~~~
compilation terminated.

@olk

This comment has been minimized.

@leezu
Copy link
Contributor

leezu commented Feb 4, 2020

I ran bisect, and actually b1e4911 is at fault here:

@anirudh2290 note that to reproduce this issue, you need a fresh checkout of the repository.
To reproduce:

git clone --recursive https://github.com/apache/incubator-mxnet/ mxnet; cd mxnet; git checkout b1e491182fa9c15da89f1b701778de3a1421811b; rm -rf *; git checkout .; git submodule update --init --recursive; mkdir build; cd build; cmake -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=Release -GNinja -DUSE_CUDA=OFF ..; ninja

@leezu leezu changed the title comping from sources: C++ ABI mismatch (Switch to modern CMake CUDA handling) C++-Package examples broken Feb 4, 2020
@leezu leezu added the C++ Related to C++ label Feb 4, 2020
@leezu
Copy link
Contributor

leezu commented Feb 4, 2020

@olk please try again with a fresh checkout of a726c40. This works on my machine.

@leezu
Copy link
Contributor

leezu commented Feb 4, 2020

Also need to figure out why CI didn't detect this issue before closing this issue

@larroy
Copy link
Contributor

larroy commented Feb 4, 2020

This also fails on my machine:

../tests/../cpp-package/include/mxnet-cpp/optimizer.hpp:37:10: fatal error: mxnet-cpp/op.h: No such file or directory

@larroy
Copy link
Contributor

larroy commented Feb 4, 2020

[753/790] /usr/bin/ccache /usr/bin/c++  -DDMLC_LOG_STACK_TRACE_SIZE=0 -DDMLC_MODERN_THREAD_LOCAL=0 -DDMLC_USE_CXX11=1 -DMSHADOW_FORCE_STREAM -DMSHADOW_INT64_TENSOR_SIZE=0 -DMSHADOW_IN_CXX11 -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_CUDA=1 -DMSHADOW_USE_CUDNN=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_SSE=1 -DMXNET_ENABLE_CUDA_RTC=1 -DMXNET_USE_BLAS_OPEN=1 -DMXNET_USE_CPP_PACKAGE=1 -DMXNET_USE_CUDA=1 -DMXNET_USE_LAPACK=1 -DMXNET_USE_MKLDNN=1 -DMXNET_USE_NCCL=0 -DMXNET_USE_NVTX=1 -DMXNET_USE_OPENCV=1 -DMXNET_USE_OPENMP=1 -DMXNET_USE_OPERATOR_TUNING=1 -DMXNET_USE_SIGNAL_HANDLER=1 -DNDEBUG=1 -DUSE_CUD
NN -I../3rdparty/mkldnn/include -I3rdparty/mkldnn/include -I../include -I../src -I../3rdparty/mshadow -I../3rdparty/nvidia_cub -I../3rdparty/tvm/nnvm/include -I../3rdparty/tvm/include -I../3rdparty
/dmlc-core/include -I../3rdparty/dlpack/include -I../tests/cpp/include -I../tests/../cpp-package/include -I../3rdparty/mkldnn/src/../include -I3rdparty/dmlc-core/include -isystem /usr/include/openc
v -isystem /usr/local/cuda/include -isystem ../3rdparty/googletest/googletest/include -isystem ../3rdparty/googletest/googletest -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -std=
c++11 -fopenmp -O3 -DNDEBUG -fPIE -MD -MT tests/CMakeFiles/mxnet_unit_tests.dir/cpp/thread_safety/thread_safety_test.cc.o -MF tests/CMakeFiles/mxnet_unit_tests.dir/cpp/thread_safety/thread_safety_t
est.cc.o.d -o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/thread_safety/thread_safety_test.cc.o -c ../tests/cpp/thread_safety/thread_safety_test.cc
FAILED: tests/CMakeFiles/mxnet_unit_tests.dir/cpp/thread_safety/thread_safety_test.cc.o
/usr/bin/ccache /usr/bin/c++  -DDMLC_LOG_STACK_TRACE_SIZE=0 -DDMLC_MODERN_THREAD_LOCAL=0 -DDMLC_USE_CXX11=1 -DMSHADOW_FORCE_STREAM -DMSHADOW_INT64_TENSOR_SIZE=0 -DMSHADOW_IN_CXX11 -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_CUDA=1 -DMSHADOW_USE_CUDNN=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_SSE=1 -DMXNET_ENABLE_CUDA_RTC=1 -DMXNET_USE_BLAS_OPEN=1 -DMXNET_USE_CPP_PACKAGE=1 -DMXNET_USE_CUDA=1 -DMXNET_USE_LAPACK=1 -DMXNET_USE_MKLDNN=1 -DMXNET_USE_NCCL=0 -DMXNET_USE_NVTX=1 -DMXNET_USE_OPENCV=1 -DMXNET_USE_OPENMP=1 -DMXNET_USE_OPERATOR_TUNING=1 -DMXNET_USE_SIGNAL_HANDLER=1 -DNDEBUG=1 -DUSE_CUDNN -I../3rdparty/mkldnn/include -I3rdparty/mkldnn/include -I../include -I../src -I../3rdparty/mshadow -I../3rdparty/nvidia_cub -I../3rdparty/tvm/nnvm/include -I../3rdparty/tvm/include -I../3rdparty/dmlc-core/include -I../3rdparty/dlpack/include -I../tests/cpp/include -I../tests/../cpp-package/include -I../3rdparty/mkldnn/src/../include -I3rdparty/dmlc-core/include -isystem /usr/include/opencv -isystem /usr/local/cuda/include -isystem ../3rdparty/googletest/googletest/include -isystem ../3rdparty/googletest/googletest -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -std=c++11 -fopenmp -O3 -DNDEBUG -fPIE -MD -MT tests/CMakeFiles/mxnet_unit_tests.dir/cpp/thread_safety/thread_safety_test.cc.o -MF tests/CMakeFiles/mxnet_unit_tests.dir/cpp/thread_safety/thread_safety_test.cc.o.d -o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/thread_safety/thread_safety_test.cc.o -c ../tests/cpp/thread_safety/thread_safety_test.cc
In file included from ../tests/../cpp-package/include/mxnet-cpp/MxNetCpp.h:35:0,
                 from ../tests/cpp/thread_safety/thread_safety_test.cc:36:
../tests/../cpp-package/include/mxnet-cpp/optimizer.hpp:37:10: fatal error: mxnet-cpp/op.h: No such file or directory

@olk
Copy link
Contributor Author

olk commented Feb 4, 2020

@leezu My HEAD points to 71a5c9e (a726c40 is already in the commit history)

@leezu
Copy link
Contributor

leezu commented Feb 4, 2020

@olk, my point is that it's broken after a726c40. Try checking out a726c40 in a clean checkout and please rerun the build

@olk
Copy link
Contributor Author

olk commented Feb 4, 2020

The problem is that my dev machine runs Arch Linux(rolling releases).
At the moment Arch Linux uses GCC-9.2 as default compiler + CUDA-10.2.
But CUDA-10.2 supports only GCC-8.3 and the Arch Linux CUDA package is linked statically against GCC-8.
After downgrading my system to the last version that uses the GCC-8.3 as the default (and CUDA-10.1 -> 2019/05/24) I could compile and link mxnet from github checkout.

But I'm still wondering why compiling and linking with GCC-9.2/CUDA-10.2 + commit ffeebe4 as HEAD works but fails with 230ceee?

@leezu
Copy link
Contributor

leezu commented Feb 4, 2020

But I'm still wondering why compiling and linking with GCC-9.2/CUDA-10.2 + commit ffeebe4 as HEAD works but fails with 230ceee?

You can see that 230ceee cleans up a lot of code, which probably affects the order in which libraries are linked. As you say, GCC 9.2 is unsupported by Cuda, so it's probably undefined behaviour and worked for some lucky reason with the old build script.

Another reason could be that there is a bug in upstream GCC9.2 / linker.

Feel free to investigate this further. If there's any bug in our build script, we're happy to update it.

Also sorry to take over your issue to track the bug introduced by #16654

@eric-haibin-lin
Copy link
Member

is this resolved?

@leezu
Copy link
Contributor

leezu commented Feb 6, 2020

C++ package problem is fixed by #17520.

@sl1pkn07
Copy link
Contributor

try with

    -DCUDA_HOST_COMPILER=/usr/bin/cc-8 \
    -DCMAKE_C_COMPILER=/usr/bin/cc-8 \
    -DCMAKE_C_COMPILER_AR=/usr/bin/gcc-ar-8 \
    -DCMAKE_C_COMPILER_RANLIB=/usr/bin/gcc-ranlib-8 \
    -DCMAKE_CXX_COMPILER=/usr/bin/c++-8 \
    -DCMAKE_CXX_COMPILER_AR=/usr/bin/gcc-ar-8 \
    -DCMAKE_CXX_COMPILER_RANLIB=/usr/bin/gcc-ranlib-8

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug C++ Related to C++
Projects
None yet
Development

No branches or pull requests

6 participants