Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Errors displaying GPU accelerated graphics when using --net host #327

Closed
v-lopez opened this issue Mar 3, 2017 · 6 comments
Closed

Errors displaying GPU accelerated graphics when using --net host #327

v-lopez opened this issue Mar 3, 2017 · 6 comments

Comments

@v-lopez
Copy link

v-lopez commented Mar 3, 2017

I'm facing issues using nvidia-docker with --net host.

I created a docker using the following Dockerfile:

FROM ubuntu:14.04
RUN apt-get update && apt-get install -y mesa-utils  x11-apps
LABEL com.nvidia.volumes.needed="nvidia_driver" 
ENV PATH /usr/local/nvidia/bin:${PATH} 
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64:${LD_LIBRARY_PATH}
CMD ["bash"]

I build it with:
sudo docker build . -t host-gpu
And run it with:

sudo nvidia-docker run -it --env="DISPLAY" --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
--rm  host-gpu glxgears

You'll see glxgears running pretty fast.

If you run it adding --net host which changes the network configuration:

sudo nvidia-docker run -it --env="DISPLAY" --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
 --net host  --rm  host-gpu glxgears

It displays a black screen.

If I run it without nvidia-docker, it works but without HW acceleration on both cases.

@3XX0
Copy link
Member

3XX0 commented Mar 5, 2017

Unfortunately, OpenGL is not supported for now (see #11). This is on the roadmap for 2.0 though.

That being said, your issue is really weird, the network namespace shouldn't change anything here unless your DISPLAY is using tcp. Maybe try using docker instead of nvidia-docker

@3XX0 3XX0 closed this as completed Mar 5, 2017
@kernle32dll
Copy link

kernle32dll commented Apr 6, 2017

I ran into the same issue today. For a bit more context on what @v-lopez is doing: He is forwarding the hosts x-session into the docker container. I can confirm that without --net host everything works as expected - opengl supported or not. However when said net option is set, all hell breaks loose.

On my local ubuntu machine I don't get any render output at all (not even glClear does anything). On our centos7 machines the application just segfaults in some nvidia lib.

For my case - I'm not using nvidia-docker, but using docker as suggested by @3XX0.

Edit: some information on the segfault, albeit probably not very useful:

[cluster03pc01:00022] *** Process received signal ***
[cluster03pc01:00022] Signal: Segmentation fault (11)
[cluster03pc01:00022] Signal code: Address not mapped (1)
[cluster03pc01:00022] Failing at address: (nil)
[cluster03pc01:00022] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f200db264b0]
[cluster03pc01:00022] [ 1] /usr/local/nvidia/lib64/libnvidia-glcore.so.375.39(+0x11d7b89)[0x7f2008f0eb89]
[cluster03pc01:00022] [ 2] /usr/local/nvidia/lib64/libnvidia-glcore.so.375.39(+0x10937e9)[0x7f2008dca7e9]
[cluster03pc01:00022] [ 3] /usr/local/nvidia/lib64/libnvidia-glcore.so.375.39(+0x1093870)[0x7f2008dca870]
[cluster03pc01:00022] [ 4] /usr/local/nvidia/lib64/libnvidia-glcore.so.375.39(+0x1052e7a)[0x7f2008d89e7a]
[cluster03pc01:00022] [ 5] /usr/local/nvidia/lib64/libcuda.so.1(+0x18fc69)[0x7f200fb16c69]
[cluster03pc01:00022] [ 6] /usr/local/nvidia/lib64/libcuda.so.1(+0xbdf46)[0x7f200fa44f46]
[cluster03pc01:00022] [ 7] /usr/local/nvidia/lib64/libcuda.so.1(cuGraphicsMapResources+0x63)[0x7f200fb8b293]
[cluster03pc01:00022] [ 8] ./renderer[0x4991e2]

@kernle32dll
Copy link

For reference, if anyone gets here by google. The problem is not strictly related to nvidia-docker. I was able to solve by adding "--privileged". I have no idea why that fixes the problem. But well - works for me.

@malex984
Copy link

malex984 commented Oct 1, 2017

The bug is still there: if host network driver is used - applications with OpenGL work but paint nothing or maybe render totally black textures (?)...

  1. The bug has nothing to do with nvidia-docker wrapper - it is the same with docker and docker_compose! Using privileged mode is a workaround not a bug-fix!
  2. The bug is only present with the host network driver - any other network driver leads to expected behavior.

Could you please reopen this issue and maybe assign it for the next Milestone (v2)?

Note that another workaround was proposed in #421 (via --device=/dev/nvidia-modeset)!

@dllu
Copy link

dllu commented Sep 7, 2019

I ran into this bug with nvidia-container-toolkit with the latest docker-ce 19.03 with the 1.1.0-glvnd image.

Running glxgears would not show anything (black window), and the terminal would indicate it is running at 2 fps. On exit, it would freeze the system temporarily and emit many xid 31 (memory page fault) errors in dmesg.

Thankfully --device=/dev/nvidia-modeset worked around it.

This was with nvidia-435 and linux-5.3rc7 on Ubuntu 18.04 with a GeForce GTX 1660 Ti.

It is quite disappointing that running OpenGL applications in Docker is so poorly supported or unsupported for many years now. We had to spend a lot of time debugging this before realizing --net host was the culprit. Running OpenGL in Docker on AMD or Intel graphics was super easy as mesa just works out of the box.

@dvof
Copy link

dvof commented Sep 11, 2019

I have the same problem @dllu running the Docker version 19.03.2.

--device=/dev/nvidia-modeset did not work, but adding --privileged did.

New with Docker so it was quite a challenge to get a Docker with ROS and Nvidia accelerated graphics to work. I was ready to give up when the OpenGL issue popped up, luckily I found a nice guide for ROS dockers here. This thread saved me from despair when the --net=host bug appeared, thx @kernle32dll for the work-around.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants