Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with include_paths when building TensorFlow 2.0 with Bazel 0.29.1 #10085

Closed
indranaut opened this issue Oct 22, 2019 · 19 comments
Closed
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Rules-CPP Issues for C++ rules

Comments

@indranaut
Copy link

I have been trying to build TensorFlow 2.0 from the master branch on a Red Hat Linux Enterprise edition cluster.

The default GCC available is 4.8.5, and it is not possible to compile TensorFlow with it because it uses explicit

std=c++14

flag which is not available in GCC 4.8.5

Hence, I started using gcc/8.3.0 using a modulefile, which is configured as follows :

(tensorflow2.0-master) -bash-4.2$ module show gcc/8.3.0/gcc-4.8.5
  -------------------------------------------------------------------
  /gpfslocalsup/pub/modules-idris/modulefiles/linux-rhel7-x86_64/gcc/8.3.0/gcc-4.8.5:
  
  
  module-whatis The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Ada, and Go, as well as libraries for these languages.
  conflict gcc
  prepend-path PATH /gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin
  prepend-path MANPATH /gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/share/man
  prepend-path LD_LIBRARY_PATH /gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib
  prepend-path LIBRARY_PATH /gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib
  prepend-path LD_LIBRARY_PATH /gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib64
  prepend-path LIBRARY_PATH /gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib64
  prepend-path CPATH /gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/include
  prepend-path CMAKE_PREFIX_PATH /gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/
  setenv CC /gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/gcc
  setenv CXX /gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/g++
  setenv FC /gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/gfortran
  setenv F77 /gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/gfortran
  setenv F90 /gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/gfortran
  -------------------------------------------------------------------

To compile TensorFlow, I first compiled bazel-0.29.1 and for that I modified the
tools/cpp/cc_toolchain_config.bzl

file as follows :

I replaced all occourances of /usr/bin/gcc, /usr/bin/gcov, /usr/bin/nm, /usr/bin/ar, /usr/bin/cpp with the binaries in PATH as shown above.
I added the include path as shown above, to the list of cxx_builtin_include_directory present in the file.

After that I proceeded to build TensorFlow.

The command used was :

CC=/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/gcc bazel --output_user_root=/tmp/ujjwal-builds build --config=opt --config=cuda --config=mkl --config=numa //tensorflow/tools/pip_package:build_pip_package --verbose_failures
This ended up giving me the following error :

INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (0 packages loaded, 0 targets configured).
 INFO: Found 1 target...
 ERROR: /tmp/ujjwal-builds/7d993f307acf01aa765c32a6dcabd368/external/gif/BUILD.bazel:8:1: undeclared inclusion(s) in rule '@gif//:gif':
 this rule is missing dependency declarations for the following files included by 'external/gif/gif_err.c':
   '/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include/stddef.h'
   '/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include/stdarg.h'
   '/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include/stdbool.h'
   '/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include/stdint.h'
 Target //tensorflow/tools/pip_package:build_pip_package failed to build
 INFO: Elapsed time: 2.152s, Critical Path: 1.63s
 INFO: 7 processes: 7 local.
 FAILED: Build did NOT complete successfully

I have tried to search for solutions online but there are no satisfactory solutions. Can anyone please help me with what is going on here as it is important for me.

If it helps, I have attached the output of gcc and g++ include paths below :

    gcc -E -xc++ - -v
    Reading specs from /gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/x86_64-pc-linux-gnu/8.3.0/specs
    COLLECT_GCC=gcc
    Target: x86_64-pc-linux-gnu
    Configured with: /gpfs7kw/linkhome/idris/softmgr/softmgr01/spack/var/spack/stage/gcc-8.3.0-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/spack-src/configure --prefix=/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3 --disable-multilib --enable-languages=c,c++,fortran --with-mpfr=/gpfslocalsup/spack_soft/mpfr/3.1.6/gcc-4.8.5-vwx7snyrzymeg5n6f7dg5tbpgk35do3k --with-gmp=/gpfslocalsup/spack_soft/gmp/6.1.2/gcc-4.8.5-5odxtlxihbfjtj4dxo52oz5f7r6ir6jk --enable-lto --with-quad --with-system-zlib --with-mpc=/gpfslocalsup/spack_soft/mpc/1.1.0/gcc-4.8.5-pogagquauxex67doa7v2mkas2gcs5xut --with-isl=/gpfslocalsup/spack_soft/isl/0.18/gcc-4.8.5-3wslknueis6r2nx3tasaizgda2ianxfa
    Thread model: posix
    gcc version 8.3.0 (GCC)
    COLLECT_GCC_OPTIONS='-E' '-v' '-mtune=generic' '-march=x86-64'
     /gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../libexec/gcc/x86_64-pc-linux-gnu/8.3.0/cc1plus -E -quiet -v -iprefix /gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/x86_64-pc-linux-gnu/8.3.0/ -D_GNU_SOURCE - -mtune=generic -march=x86-64
    ignoring nonexistent directory "/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../x86_64-pc-linux-gnu/include"
    ignoring duplicate directory "/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/../../lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0"
    ignoring duplicate directory "/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/../../lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu"
    ignoring duplicate directory "/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/../../lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward"
    ignoring duplicate directory "/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/../../lib/gcc/x86_64-pc-linux-gnu/8.3.0/include"
    ignoring duplicate directory "/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/../../lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed"
    ignoring nonexistent directory "/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/../../lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../x86_64-pc-linux-gnu/include"
    ignoring duplicate directory "/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/include"
      as it is a non-system directory that duplicates a system directory
    #include "..." search starts here:
    #include <...> search starts here:
     /gpfslocalsys/intel/parallel_studio_xe_2019_update5_cluster_edition/compilers_and_libraries_2019.5.281/linux/mkl/include
     /gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0
     /gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu
     /gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward
     /gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/x86_64-pc-linux-gnu/8.3.0/include
     /gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed
     /usr/local/include
     /gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/../lib/gcc/../../include
     /usr/include
    End of search list.

@irengrig irengrig added team-Rules-CPP Issues for C++ rules untriaged labels Oct 23, 2019
@irengrig
Copy link
Contributor

/cc @hlopko @meteorcloudy

@meteorcloudy
Copy link
Member

Also /cc @oquenchil

@indranaut
Copy link
Author

I also exported GCC_HOST_COMPILER and GCC_HOST_COMPILER_PREFIX and it did not make a difference. I learnt from #4365 that TF does not use the default toolchain of bazel. So, probably modifying cc_configure.bzl was not required.

However, even without modifying cc_configure.bzl, I continue to get the same error. I have verified that the environment variables are correctly exported and that gcc on its own is working fine. Therefore, I really need some assistance here on this matter.

@indranaut
Copy link
Author

I have also tried with other versions of bazel such as 0.28.1 and the problem persists.

@meteorcloudy
Copy link
Member

meteorcloudy commented Oct 24, 2019

@indranaut I see you are building TensorFlow with Cuda support, then the toolchain is generated by https://github.com/tensorflow/tensorflow/blob/master/third_party/gpus/cuda_configure.bzl#L1160, maybe you can try debug this file and see why your include path was not added to %{cxx_builtin_include_directories}

@indranaut
Copy link
Author

@meteorcloudy
I think that the include_paths are being added. I get the following output on print("PATHS ARE {}".format(cuda_defines["%{cxx_builtin_include_directories}"]))

DEBUG: /tmp/ujjwal-builds/tensorflow/third_party/gpus/cuda_configure.bzl:1168:9: PATHS ARE "/gpfslocalsys/intel/parallel_studio_xe_2019_update5_cluster_edition/compilers_and_libraries_2019.5.281/linux/mkl/include", "/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/include/c++/8.3.0", "/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/include/c++/8.3.0/x86_64-pc-linux-gnu", "/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/include/c++/8.3.0/backward", "/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include", "/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed", "/usr/local/include", "/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/include", "/usr/include", "/gpfslocalsys/cuda/10.1.1/targets/x86_64-linux/include", "/gpfslocalsys/cuda/10.1.1/include", "/gpfslocalsys/cuda/10.1.1/extras/CUPTI/include", "/gpfslocalsup/pub/cudnn/10.1-v7.5.1.10/include"

@indranaut
Copy link
Author

I was suspecting that it could be due to some cache issue, but I have tried by cleaning the cache using bazel clean --expunge and also providing an output_user_root to no avail.

@meteorcloudy
Copy link
Member

The INCLUDE path doesn't look exactly the same,
Bazel complains about

This rule is missing dependency declarations for the following files included by 'external/gif/gif_err.c':
   '/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include/stddef.h'
...

But the cxx_builtin_include_directories contains

/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include

What is the /gpfs7kro prefix?

@indranaut
Copy link
Author

@meteorcloudy

It is due to a symbolic link. gpfslocalsup actually lies inside /gpfs7kro but there is a symbolic link to it present in /.

@meteorcloudy
Copy link
Member

I believe /gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include should also be in cxx_builtin_include_directories, can you try that?

@indranaut
Copy link
Author

Well, ideally they should be there. module load command already adds them in the $CPATH

(tensorflow2.0-master) [unm95ab@jean-zay1: tensorflow]$ echo $CPATH
/gpfslocalsys/intel/parallel_studio_xe_2019_update5_cluster_edition/compilers_and_libraries_2019.5.281/linux/mkl/include:/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/include

I think what is happening is that bazel instead of using the symlink path is trying to use the absolute path associated with the symlink.

I do not know if this is an intended behavior or if there is an option to override this behavior.

@meteorcloudy
Copy link
Member

Can you rerun the build with -s so that Bazel will print the complete command it runs? So we can see what exactly include paths are used.

@indranaut
Copy link
Author

Here is a small part of the output

SUBCOMMAND: # @llvm//:support [action 'Compiling external/llvm/lib/Support/regexec.c']
(cd /tmp/ujjwal-builds/ujjwal_bazel/dea06f755c9fae050fe508dad9fa2776/execroot/org_tensorflow && \
  exec env - \
    CUDA_TOOLKIT_PATH=/gpfslocalsys/cuda/10.1.1 \
    GCC_HOST_COMPILER_PATH=/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/gcc \
    LD_LIBRARY_PATH=/gpfslocalsys/intel/parallel_studio_xe_2019_update5_cluster_edition/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin:/gpfslocalsys/intel/parallel_studio_xe_2019_update5_cluster_edition/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/gpfslocalsup/pub/nccl/nccl_2.4.2-1+cuda10.1_x86_64/lib:/gpfslocalsup/pub/cudnn/10.1-v7.5.1.10/lib64:/gpfslocalsys/cuda/10.1.1/nvvm/lib64:/gpfslocalsys/cuda/10.1.1/lib64:/gpfslocalsys/cuda/10.1.1/samples/common/lib/linux/x86_64:/gpfslocalsys/cuda/10.1.1/targets/x86_64-linux/lib:/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib64:/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib:/linkhome/rech/gensta01/unm95ab/utils/lib64:/linkhome/rech/gensta01/unm95ab/utils/lib::/gpfslocalsys/slurm/current/lib/slurm:/gpfslocalsys/slurm/current/lib \
    PATH=/tmp/ujjwal-builds/bazel-0.28.1/output:/tmp/ujjwal-builds/bazel-0.29.1/output:/gpfslocalsys/cuda/10.1.1/samples:/gpfslocalsys/cuda/10.1.1/nvvm/bin:/gpfslocalsys/cuda/10.1.1/bin:/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin:/gpfsscratch/rech/qdh/unm95ab/my-installations/anaconda3/envs/tensorflow2.0-master/bin:/gpfsscratch/rech/qdh/unm95ab/my-installations/anaconda3/condabin:/linkhome/rech/gensta01/unm95ab/utils/bin:/opt/clmgr/sbin:/opt/clmgr/bin:/opt/sgi/sbin:/opt/sgi/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/c3/bin:/usr/lpp/mmfs/bin:/sbin:/bin:/gpfslocalsup/bin:/gpfslocalsys/bin:/gpfslocalsys/idrzap/current/bin:/gpfslocalsys/slurm/current/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/gpfsscratch/rech/qdh/unm95ab/my-installations/anaconda3/envs/tensorflow2.0-master/bin/python \
    PYTHON_LIB_PATH=/gpfsscratch/rech/qdh/unm95ab/my-installations/anaconda3/envs/tensorflow2.0-master/lib/python3.6/site-packages \
    TF_CONFIGURE_IOS=0 \
    TF_CUDA_COMPUTE_CAPABILITIES=7.0 \
    TF_CUDA_PATHS=/gpfslocalsup/pub/cudnn/10.1-v7.5.1.10,/gpfslocalsys/cuda/10.1.1,/gpfslocalsup/pub/nccl/nccl_2.4.2-1+cuda10.1_x86_64,/linkhome/rech/gensta01/unm95ab/utils \
    TF_CUDA_VERSION=10 \
    TF_CUDNN_VERSION=7 \
    TF_NCCL_VERSION=2.4 \
    TF_NEED_CUDA=1 \
    TF_NEED_TENSORRT=1 \
    TF_TENSORRT_VERSION=5 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-py2-opt/bin/external/llvm/_objs/support/regexec.d '-frandom-seed=bazel-out/k8-py2-opt/bin/external/llvm/_objs/support/regexec.o' -DLLVM_ENABLE_STATS -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DLLVM_BUILD_GLOBAL_ISEL -iquote external/llvm -iquote bazel-out/k8-py2-opt/bin/external/llvm -iquote external/zlib_archive -iquote bazel-out/k8-py2-opt/bin/external/zlib_archive -isystem external/llvm/include -isystem bazel-out/k8-py2-opt/bin/external/llvm/include -isystem external/zlib_archive -isystem bazel-out/k8-py2-opt/bin/external/zlib_archive -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIE -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections '-march=native' -Wno-sign-compare -c external/llvm/lib/Support/regexec.c -o bazel-out/k8-py2-opt/bin/external/llvm/_objs/support/regexec.o)
Target //tensorflow/tools/pip_package:build_pip_package failed to build
ERROR: /tmp/ujjwal-builds/tensorflow/tensorflow/lite/toco/python/BUILD:79:1 undeclared inclusion(s) in rule '@curl//:curl':
this rule is missing dependency declarations for the following files included by 'external/curl/lib/curl_get_line.c':
  '/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include/stddef.h'
  '/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include/stdint.h'
  '/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include/stdarg.h'
  '/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed/limits.h'
  '/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed/syslimits.h'
  '/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include/stdbool.h'
INFO: Elapsed time: 16.523s, Critical Path: 2.28s
INFO: 42 processes: 42 local.
FAILED: Build did NOT complete successfully

@meteorcloudy
Copy link
Member

What if you set CC=/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/gcc ?

@meteorcloudy
Copy link
Member

Basically, I think CC should match what's in INCLUDE

@indranaut
Copy link
Author

indranaut commented Oct 25, 2019

I think that your hunch is right. It begins to compile but later stops with the following error :

SUBCOMMAND: # //tensorflow/python:framework/fast_tensor_util.so [action 'Linking tensorflow/python/framework/fast_tensor_util.so']
(cd /tmp/ujjwal-builds/ujjwal_bazel/dea06f755c9fae050fe508dad9fa2776/execroot/org_tensorflow && \
  exec env - \
    CUDA_TOOLKIT_PATH=/gpfslocalsys/cuda/10.1.1 \
    GCC_HOST_COMPILER_PATH=/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin/gcc \
    GCC_HOST_COMPILER_PREFIX=/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3 \
    LD_LIBRARY_PATH=/gpfslocalsys/intel/parallel_studio_xe_2019_update5_cluster_edition/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin:/gpfslocalsys/intel/parallel_studio_xe_2019_update5_cluster_edition/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/gpfslocalsup/pub/nccl/nccl_2.4.2-1+cuda10.1_x86_64/lib:/gpfslocalsup/pub/cudnn/10.1-v7.5.1.10/lib64:/gpfslocalsys/cuda/10.1.1/nvvm/lib64:/gpfslocalsys/cuda/10.1.1/lib64:/gpfslocalsys/cuda/10.1.1/samples/common/lib/linux/x86_64:/gpfslocalsys/cuda/10.1.1/targets/x86_64-linux/lib:/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib64:/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/lib:/linkhome/rech/gensta01/unm95ab/utils/lib64:/linkhome/rech/gensta01/unm95ab/utils/lib::/gpfslocalsys/slurm/current/lib/slurm:/gpfslocalsys/slurm/current/lib \
    PATH=/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin:/tmp/ujjwal-builds/bazel-0.28.1/output:/tmp/ujjwal-builds/bazel-0.29.1/output:/gpfslocalsys/cuda/10.1.1/samples:/gpfslocalsys/cuda/10.1.1/nvvm/bin:/gpfslocalsys/cuda/10.1.1/bin:/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin:/gpfsscratch/rech/qdh/unm95ab/my-installations/anaconda3/envs/tensorflow2.0-master/bin:/gpfsscratch/rech/qdh/unm95ab/my-installations/anaconda3/condabin:/linkhome/rech/gensta01/unm95ab/utils/bin:/opt/clmgr/sbin:/opt/clmgr/bin:/opt/sgi/sbin:/opt/sgi/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/c3/bin:/usr/lpp/mmfs/bin:/sbin:/bin:/gpfslocalsup/bin:/gpfslocalsys/bin:/gpfslocalsys/idrzap/current/bin:/gpfslocalsys/slurm/current/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/gpfsscratch/rech/qdh/unm95ab/my-installations/anaconda3/envs/tensorflow2.0-master/bin/python \
    PYTHON_LIB_PATH=/gpfsscratch/rech/qdh/unm95ab/my-installations/anaconda3/envs/tensorflow2.0-master/lib/python3.6/site-packages \
    TF_CONFIGURE_IOS=0 \
    TF_CUDA_COMPUTE_CAPABILITIES=7.0 \
    TF_CUDA_PATHS=/gpfslocalsup/pub/cudnn/10.1-v7.5.1.10,/gpfslocalsys/cuda/10.1.1,/gpfslocalsup/pub/nccl/nccl_2.4.2-1+cuda10.1_x86_64,/linkhome/rech/gensta01/unm95ab/utils \
    TF_CUDA_VERSION=10 \
    TF_CUDNN_VERSION=7 \
    TF_NCCL_VERSION=2.4 \
    TF_NEED_CUDA=1 \
    TF_NEED_TENSORRT=1 \
    TF_TENSORRT_VERSION=5 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-py2-opt/bin/tensorflow/python/framework/fast_tensor_util.so-2.params)
ERROR: /tmp/ujjwal-builds/ujjwal_bazel/dea06f755c9fae050fe508dad9fa2776/external/com_google_absl/absl/debugging/BUILD.bazel:30:1: Linking of rule '@com_google_absl//absl/debugging:stacktrace' failed (Exit 1)
src/main/tools/process-wrapper-legacy.cc:58: "execvp(/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/ar, ...)": No such file or directory
Target //tensorflow/tools/pip_package:build_pip_package failed to build
ERROR: /tmp/ujjwal-builds/tensorflow/tensorflow/tools/pip_package/BUILD:49:1 Linking of rule '@com_google_absl//absl/debugging:stacktrace' failed (Exit 1)
INFO: Elapsed time: 183.125s, Critical Path: 79.48s
INFO: 7711 processes: 7711 local.
FAILED: Build did NOT complete successfully

It is trying to locate ar in a folder where it does not exist. It is trying to find ar in /gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3 while it is in /gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin in the form of gcc-ar. I have also found that bazel insists on using ar rather than gcc-ar ( #3760)

I have already checked that the above path is not in $PATH

echo $PATH
/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin:/tmp/ujjwal-builds/bazel-0.28.1/output:/tmp/ujjwal-builds/bazel-0.29.1/output:/gpfslocalsys/cuda/10.1.1/samples:/gpfslocalsys/cuda/10.1.1/nvvm/bin:/gpfslocalsys/cuda/10.1.1/bin:/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin:/gpfsscratch/rech/qdh/unm95ab/my-installations/anaconda3/envs/tensorflow2.0-master/bin:/gpfsscratch/rech/qdh/unm95ab/my-installations/anaconda3/condabin:/linkhome/rech/gensta01/unm95ab/utils/bin:/opt/clmgr/sbin:/opt/clmgr/bin:/opt/sgi/sbin:/opt/sgi/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/c3/bin:/usr/lpp/mmfs/bin:/sbin:/bin:/gpfslocalsup/bin:/gpfslocalsys/bin:/gpfslocalsys/idrzap/current/bin:/gpfslocalsys/slurm/current/bin

@meteorcloudy
Copy link
Member

I think the ar path is affected by those two places:
https://github.com/tensorflow/tensorflow/blob/f20387f272a1d825f1266c53b9a09b478b1f552c/third_party/gpus/cuda_configure.bzl#L1119
This is for GCC_HOST_COMPILER_PREFIX,
you should set GCC_HOST_COMPILER_PREFIX=/gpfs7kro/gpfslocalsup/spack_soft/gcc/8.3.0/gcc-4.8.5-opnwtdjumg2hxo4ljvnx77ugb6afmvj3/bin
and here:
https://github.com/tensorflow/tensorflow/blob/f20387f272a1d825f1266c53b9a09b478b1f552c/third_party/gpus/crosstool/cc_toolchain_config.bzl.tpl#L1409
you can change the ar name.

@hlopko hlopko added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Oct 25, 2019
@indranaut
Copy link
Author

@meteorcloudy Thank you. This worked. I wrapped gcc-ar in a script taking parameters from the parameter file and passing them directly to gcc-ar. It compiled successfully.

@Char-Aznable
Copy link

I hit the same problem. Instead of compiler from spack, I have gcc from linuxbrew. The same error shows up despite setting GCC_HOST_COMPILER_PREFIX to $LINUXBREWHOME/bin/, which has the gcc compiler and ar in it. This looks like an obvious bug on the bazel config side of setting up the build environment. The whole GCC_HOST_COMPILER_PREFIX looks very buggy -- its name is not even meaningful -- what if the user is trying to use clang instead of gcc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Rules-CPP Issues for C++ rules
Projects
None yet
Development

No branches or pull requests

5 participants