Skip to content
This repository has been archived by the owner on Jun 8, 2023. It is now read-only.

[Bug] Unable to build docker image for v1.8.0 #286

Closed
kfunaoka opened this issue Sep 6, 2018 · 11 comments
Closed

[Bug] Unable to build docker image for v1.8.0 #286

kfunaoka opened this issue Sep 6, 2018 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@kfunaoka
Copy link

kfunaoka commented Sep 6, 2018

Bug

Expected Behavior

Built successfully

Actual Behavior

Show the following error

Steps to Reproduce the Problem

  1. cd Autoware/docker/generic
  2. ./build.sh kinetic (or indigo)
-- +++ processing catkin package: 'xsens_driver'
-- ==> add_subdirectory(sensing/drivers/imu/packages/xsens/src/xsens_driver)
-- Using these message generators: gencpp;geneus;genlisp;genpy
-- +++ processing catkin package: 'ymc'
-- ==> add_subdirectory(actuation/vehicles/packages/ymc)
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_CUDA_LIBRARY (ADVANCED)
    linked by target "libdpm_ttic" in directory /home/autoware/Autoware/ros/src/computing/perception/detection/vision_detector/libs/dpm_ttic

-- Configuring incomplete, errors occurred!
See also "/home/autoware/Autoware/ros/build/CMakeFiles/CMakeOutput.log".
See also "/home/autoware/Autoware/ros/build/CMakeFiles/CMakeError.log".
####
#### Running command: "cmake /home/autoware/Autoware/ros/src -DCATKIN_DEVEL_PREFIX=/home/autoware/Autoware/ros/devel -DCMAKE_INSTALL_PREFIX=/home/autoware/Autoware/ros/install -G Unix Makefiles" in "/home/autoware/Autoware/ros/build"
####
Invoking "cmake" failed
The command '/bin/sh -c /bin/bash -c 'source /opt/ros/indigo/setup.bash; cd /home/$USERNAME/Autoware/ros/src; git submodule update --init --recursive; catkin_init_workspace; cd ../; ./catkin_make_release'' returned a non-zero code: 1

Specifications

  • Ubuntu version: 16.04
  • ROS version: kinetic
  • Autoware branch: v1.8.0 (88a1f08)
@kfunaoka kfunaoka added the bug Something isn't working label Sep 6, 2018
@kfunaoka kfunaoka self-assigned this Sep 6, 2018
@kfunaoka
Copy link
Author

kfunaoka commented Sep 7, 2018

I'm trying to find why the error occues.

Autoware can't build in autoware/autoware:1.7.0-kinetic

# In local machine
$ cd Autoware/docker/generic
$ ./run kinetic       # into autoware/autoware:1.7.0-kinetic

# In autoware/autoware:1.7.0-kinetic
$ cd Autoware/ros
$ ./catkin_make_release
...
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/libcuda.so: file not recognized: File truncated
collect2: error: ld returned 1 exit status
computing/perception/detection/vision_detector/libs/dpm_ttic/CMakeFiles/libdpm_ttic.dir/build.make:747: recipe for target '/home/autoware/Autoware/ros/devel/lib/liblibdpm_ttic.so' failed
make[2]: *** [/home/autoware/Autoware/ros/devel/lib/liblibdpm_ttic.so] Error 1
CMakeFiles/Makefile2:20562: recipe for target 'computing/perception/detection/vision_detector/libs/dpm_ttic/CMakeFiles/libdpm_ttic.dir/all' failed
make[1]: *** [computing/perception/detection/vision_detector/libs/dpm_ttic/CMakeFiles/libdpm_ttic.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

In autoware/autoware:1.7.0-kinetic, the size of libcuda.so.390.30 is zero.

$ ll /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/libcuda.so.390.30 
-rw-r--r-- 1 root root 0 May 17 03:31 /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/libcuda.so.390.30

@kfunaoka
Copy link
Author

kfunaoka commented Sep 7, 2018

@yn-mrse Does it work on your environment?

@kfunaoka
Copy link
Author

kfunaoka commented Sep 7, 2018

This error can be reproduced by @yn-mrse, who built v1.7.0 docker.

@amc-nu
Copy link
Member

amc-nu commented Sep 9, 2018

@kfunaoka the error is caused due to Autoware requiring the CUDA runtime to be available at compilation time. However, this feature is not available in nvidia-docker yet.

Checking the Dockerfile, it hasn't been updated since around version 1.6:

  1. All the dependencies now can be solved using rosdep, just as in the CI code or the wiki
  2. There is no need to pull YOLO code since it is already integrated in the perception module
  3. It is using nvidia-docker v1 (this needs to be updated)
  4. It is using an old CUDA version (8.0)

In any case, a workaround to the CUDA error at image build time, please:

  1. update your nvidia-docker to v2
  2. apply the following changes to the Dockerfile
diff --git a/docker/generic/Dockerfile.kinetic b/docker/generic/Dockerfile.kinetic
index 41dad0a..d5e2dd8 100644
--- a/docker/generic/Dockerfile.kinetic
+++ b/docker/generic/Dockerfile.kinetic
@@ -1,6 +1,7 @@
-FROM nvidia/cuda:8.0-devel-ubuntu16.04
+FROM nvidia/cuda:9.1-devel-ubuntu16.04
 MAINTAINER Yuki Iida <yuki.iida@tier4.jp>
-
+ENV NVIDIA_VISIBLE_DEVICES all
+ENV NVIDIA_DRIVER_CAPABILITIES all
 # Develop
 RUN apt-get update && apt-get install -y \
         software-properties-common \
@@ -60,13 +61,10 @@ RUN sudo rosdep init \
         && echo "source /opt/ros/kinetic/setup.bash" >> ~/.bashrc
 
 # YOLO_V2
-RUN cd && git clone https://github.com/pjreddie/darknet.git
-RUN cd ~/darknet && git checkout 56d69e73aba37283ea7b9726b81afd2f79cd1134
-RUN cd ~/darknet/data && wget https://pjreddie.com/media/files/yolo.weights
 
 # Install Autoware
 RUN cd && git clone https://github.com/CPFL/Autoware.git /home/$USERNAME/Autoware
-RUN /bin/bash -c 'source /opt/ros/kinetic/setup.bash; cd /home/$USERNAME/Autoware/ros/src; git submodule update --init --recursive; catkin_init_workspace; cd ../; ./catkin_make_release'
+RUN /bin/bash -c 'source /opt/ros/kinetic/setup.bash; cd /home/$USERNAME/Autoware/ros/src; git submodule update --init --recursive; catkin_init_workspace; cd ../;catkin_make -DCMAKE_LIBRARY_PATH=/usr/local/cuda/lib64/stubs clean; source devel/setup.bash;catkin_make -DCMAKE_LIBRARY_PATH=/usr/local/cuda/lib64/stubs'
 RUN echo "source /home/$USERNAME/Autoware/ros/devel/setup.bash" >> /home/$USERNAME/.bashrc
 
 # Setting
diff --git a/docker/generic/build.sh b/docker/generic/build.sh
index ba27632..9b292e4 100755
--- a/docker/generic/build.sh
+++ b/docker/generic/build.sh
@@ -4,7 +4,7 @@
 if [ "$1" = "kinetic" ] || [ "$1" = "indigo" ]
 then
     echo "Use $1"
-    nvidia-docker build -t autoware-$1 -f Dockerfile.$1 . --no-cache
+    docker build -t autoware-$1 -f Dockerfile.$1 . 
 else
     echo "Select distribution, kinetic|indigo"
 fi
diff --git a/docker/generic/run.sh b/docker/generic/run.sh
index 9d83abc..b3c139a 100755
--- a/docker/generic/run.sh
+++ b/docker/generic/run.sh
@@ -22,7 +22,8 @@ else
 fi
 echo "Shared directory: ${HOST_DIR}"
 
-nvidia-docker run \
+docker run \
+    --runtime=nvidia \
     -it --rm \
     --volume=$XSOCK:$XSOCK:rw \
     --volume=$XAUTH:$XAUTH:rw \

This is a quick fix, please add the correct fix to yolo (only download the weights files at the path described in yolo readme)

@amc-nu
Copy link
Member

amc-nu commented Sep 9, 2018

@kfunaoka @cirpue49 @yn-mrse also it would be nice if you could add a simple prefix or to differentiate docker images versions.

@kfunaoka
Copy link
Author

@amc-nu Thank you for the advice! @esteve is trying to update nvidia docker to v2 at autowarefoundation/autoware#1416. Everything seems to go well after updated!

@BillWSY
Copy link

BillWSY commented Sep 18, 2018

Depending on CUDA runtime during compilation does not seem alright to me. Our build environment does not have an NVIDIA GPU, but we transfer images to computers with GPUs.

Here is what I did:

ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/libcuda.so
./catkin_make_release
rm -f /usr/local/cuda/lib64/libcuda.so

@sgermanserrano
Copy link

PR autowarefoundation/autoware#1536 should fix this bug

@yuteno
Copy link

yuteno commented Sep 28, 2018

With nvidia-docker2, Rviz cannot be executed on images built with the above Dockerfile because openGL applications are not supported.
NVIDIA/nvidia-docker#534

I solved this problem by using the cudagl image as the source.
https://hub.docker.com/r/nvidia/cudagl/

Please change
FROM nvidia/cuda:9.1-devel-ubuntu16.04 in the Dockerfile
to FROM nvidia/cudagl:9.1-devel-ubuntu16.04 .

@kyesh
Copy link

kyesh commented Oct 8, 2018

For anyone else in the interim for people who has this issue but doesn't want to use the development branch. Feel free to use my Docker files. It clones my repo which only has the changes in PR autowarefoundation/autoware#1536 https://github.com/kyesh/Autoware/blob/master/docker/generic/Dockerfile.kinetic

@kfunaoka
Copy link
Author

Sorry. This is an old issue fixed at v1.9.0. I'll close it.

@mitsudome-r mitsudome-r transferred this issue from autowarefoundation/autoware Mar 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants