Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Can't set up a Xorg server in secondary graphics card #136

Closed
JoseTomasTocino opened this issue Jul 12, 2016 · 14 comments
Closed

Can't set up a Xorg server in secondary graphics card #136

JoseTomasTocino opened this issue Jul 12, 2016 · 14 comments

Comments

@JoseTomasTocino
Copy link

I'm working with a server with dual Quadro K2200 graphics card. The host is RHEL 7.2, I've installed Docker 0.11 and nvidia-docker. The output for nvidia-smi is this:

# nvidia-docker run --rm nvidia/cuda nvidia-smi
Tue Jul 12 14:42:22 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27                 Driver Version: 367.27                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2200        Off  | 0000:03:00.0      On |                  N/A |
| 42%   38C    P8     1W /  39W |    164MiB /  4041MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro K2200        Off  | 0000:04:00.0     Off |                  N/A |
| 42%   36C    P8     1W /  39W |      1MiB /  4041MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Now I'm trying to build a container (currently I don't care about the distro, I'm trying with both centos and debian) that uses the second card to show an X server, but so far I haven't had any luck. This is the Dockerfile I'm working with:

FROM debian

RUN apt-get update
RUN apt-get install -y xserver-xorg xinit xdm pciutils vim module-init-tools

After building the image I start a container with

nvidia-docker run --rm -t -i --privileged debian_x bash

And there I run Xorg -configure to create a scaffolding for the conf file, that I tweak to use the nvidia driver and just the second car (by removing all references to the first card). If I try to start the X server with xinit, I get the following at the end of the log:

[ 11153.937] (II) LoadModule: "nvidia"
[ 11153.937] (WW) Warning, couldn't open module nvidia
[ 11153.937] (II) UnloadModule: "nvidia"
[ 11153.937] (II) Unloading nvidia
[ 11153.937] (EE) Failed to load module "nvidia" (module does not exist, 0)
[ 11153.937] (EE) No drivers available.
[ 11153.937] (EE) 
Fatal server error:
[ 11153.937] (EE) no screens found(EE) 

I've tried installing the NVIDIA driver from within the container, but it fails as it already detects nvidia-related modules running, as the output of lsmod shows:

# lsmod | grep nvidia
nvidia_drm             43350  2 
nvidia_modeset        764270  4 nvidia_drm
nvidia              11070459  121 nvidia_modeset
drm_kms_helper        125008  1 nvidia_drm
drm                   349210  5 drm_kms_helper,nvidia_drm
i2c_core               40582  6 drm,igb,ipmi_ssif,drm_kms_helper,i2c_algo_bit,nvidia

Any clue? Thanks!

@3XX0
Copy link
Member

3XX0 commented Jul 12, 2016

We don't support this use case. I'm curious though why don't you leverage the X server from your host instead?

@ruffsl
Copy link
Contributor

ruffsl commented Jul 12, 2016

The ROS community does something similar by leverage the X server from your host.
FYI here are some wiki tutorials about the topic:
http://wiki.ros.org/docker/Tutorials/GUI#The_simple_way
http://wiki.ros.org/docker/Tutorials/Hardware%20Acceleration#Using_nvidia-docker

@JoseTomasTocino
Copy link
Author

Thanks for the answers.

@3XX0 I was actually trying to compare the (supposed) performance gain of having the X server directly in the container talking to the graphics card instead of the host. I understand you don't support this use case, but AFAIK it is doable, right? I eventually managed to start the X server in the container by manually copying the necessary files - namely, the nvidia_drv.so that was missing. However it broke the host's X server in the first card, I think it has to do with the management of tty or something (I'm actually clueless about this).

@ruffsl thanks for the links. As I mention in the previous paragraph, my intention was to compare that method you reference (following this tutorial) and the x-server-direct-from-the-container approach. However, the method you mention doesn't allow me to run more graphic-hungry apps, like glxgears:

# nvidia-docker run -ti --rm -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix coge_gnome_glmark2 bash
# glxgears
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  153 (GLX)
  Minor opcode of failed request:  3 (X_GLXCreateContext)
  Value in failed request:  0x0
  Serial number of failed request:  35
  Current serial number in output stream:  37

Apps like gedit work properly.

@ruffsl
Copy link
Contributor

ruffsl commented Jul 14, 2016

@JoseTomasTocino

This Dockerfile and launch process works for me:

FROM ubuntu:16.04

# install GLX-Gears
RUN apt-get update && apt-get install -y \
    mesa-utils && \
    rm -rf /var/lib/apt/lists/*

# nvidia-docker hooks
LABEL com.nvidia.volumes.needed="nvidia_driver"
ENV PATH /usr/local/nvidia/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64:${LD_LIBRARY_PATH}
docker build -t foo .
xhost +local:root
nvidia-docker run -it \
    --env="DISPLAY" \
    --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
    foo glxgears
xhost -local:root

image

We use this same method to run RVIZ and gazebo from containers, rendering dense point clouds, raytracing for visual sensors, and displaying robot 3D models. Here's an old example from a little more that a year ago: https://www.youtube.com/watch?v=djLKmDMsdxM . Note, I was getting the same performance from running the stack locally on the host.

I'm not sure how much you'd gain from not needing to use the unix socket, but I guess your search might be helpful when we all migrate away from Xserver and onto Mir or what not, but would still like to use legacy X apps.

@flx42
Copy link
Member

flx42 commented Jul 14, 2016

@ruffsl that's a bit similar to what we have on our experimental opengl branch: 0158f50

@3XX0
Copy link
Member

3XX0 commented Jul 14, 2016

Your problem is probably due to conflicting libGL, as @flx42 mentioned try the opengl branch this should just work.

Also it's using direct rendering so you shouldn't see any performance impact.

@JoseTomasTocino
Copy link
Author

Thanks a lot guys. Using both @ruffsl 's Dockerfile and the code in the opengl branch I've managed to launch both glxgears and glmark2 from the container and it works very well. Looks like I was missing the modification of the PATH and LD_LIBRARY_PATH environment variables.

And exactly as @3XX0 has mentioned, glmark2 scores essentially the same when run from the host and from the container. Given the circumstances, there's no need to keep digging into running the X server from the container. However as I briefly commented here, it's definitely possible, once you sort out the coexistence of the host's X server and the container's. I think that would mean trying to assign a different tty to the container's host, at least as a first step. There doesn't seem to be too much info about this.

@ruffsl
Copy link
Contributor

ruffsl commented Jun 20, 2018

Just as an update for folks, here is a minimal working example for GLX-Gears GUI using nvidia-docker2.

FROM ubuntu:18.04

# install GLX-Gears and the GL Vendor-Neutral Dispatch library
RUN apt-get update && apt-get install -y \
    libglvnd0 \
    mesa-utils && \
    rm -rf /var/lib/apt/lists/*

# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES \
    ${NVIDIA_VISIBLE_DEVICES:-all}
ENV NVIDIA_DRIVER_CAPABILITIES \
    ${NVIDIA_DRIVER_CAPABILITIES:+$NVIDIA_DRIVER_CAPABILITIES,}graphics
docker build -t foo .
xhost +local:root
nvidia-docker run -it \
    --env="DISPLAY" \
    --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
    foo glxgears
xhost -local:root

image

@3XX0
Copy link
Member

3XX0 commented Jun 20, 2018

FYI, we have samples here

docker build git@gitlab.com:nvidia/samples.git#:opengl/ubuntu16.04/glxgears

@mash-graz
Copy link

https://gitlab.com/mash-graz/resolve is another example how to run a quite demanding real world application (DaVinci Resolve) utilizing OpenGL in nvidia-docker2

@nathantsoi
Copy link

thx for the example @ruffsl

i'm trying to get this working on the TX2 and i have built libglvnd from source since there are no 16.04 arm64 packages it seems

# OpenGL
# https://github.com/NVIDIA/nvidia-docker/issues/136#issuecomment-398593070
RUN apt-get install -y mesa-utils libxext-dev libx11-dev x11proto-gl-dev autogen autoconf libtool

RUN cd deps && \
  git clone https://github.com/NVIDIA/libglvnd.git && \
  cd libglvnd && \
  git reset --hard 9d909106f232209cf055428cae18387c18918704 && \
  bash autogen.sh && bash configure && make -j6 && \
  make install

ENV NVIDIA_VISIBLE_DEVICES \
    ${NVIDIA_VISIBLE_DEVICES:-all}
ENV NVIDIA_DRIVER_CAPABILITIES \
    ${NVIDIA_DRIVER_CAPABILITIES:+$NVIDIA_DRIVER_CAPABILITIES,}graphics

however i get: BadValue (integer parameter out of range for operation)

# glxgears
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  154 (GLX)
  Minor opcode of failed request:  3 (X_GLXCreateContext)
  Value in failed request:  0x0
  Serial number of failed request:  34
  Current serial number in output stream:  35

any suggestions?

@lromor
Copy link

lromor commented Sep 11, 2018

@nathantsoi
I think the issue is related with the video group mapping.
I guess you are trying to mount the xorg unix sockets and avoid the use of xauth by using
the same gid/uid mapping of a user inside the container.
If that's the case, simply run usermod -a -G video myuser.

If the problem still occours, be sure that the gid of the group video is the same between the host and the container.
Regards,

-l

@rubenvandeven
Copy link

@nathantsoi Did you get it working in the end? I have the same error. Even when following the nvidia/opengl glxgears sample.

@nathantsoi gid of video is 44 on both host and container. As per the example, I run xhost +si:localuser:root before running the container. Could it still be a rights issue?

I have an optimus laptop. However I do sudo tee /proc/acpi/bbswitch <<< ON to make sure the card is on. Proven by the fact that nvidia-smi works without any problems in the container.

Furthermore, the example by ruffsl in issue #136 works, but runs glxgears on my Intel card, rather than Nvidia.

Driver version 390.87, Cuda 9.0 on both host & container.

Thanks for any suggestion!

@nathantsoi
Copy link

I haven't had time to debug again. I ended up copying all the dependencies to a folder and running outside of docker (after building w/in docker) by setting LD_LIBRARY_PATH to the folder. Hope that helps!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants