Skip to content
This repository has been archived by the owner on Apr 6, 2018. It is now read-only.

Run your first agent iptable issue #182

Closed
deathcoder opened this issue May 6, 2017 · 16 comments
Closed

Run your first agent iptable issue #182

deathcoder opened this issue May 6, 2017 · 16 comments

Comments

@deathcoder
Copy link

deathcoder commented May 6, 2017

(First, please check https://github.com/openai/universe/wiki/Solutions-to-common-problems for solutions to many common problems)
problem is not described in the common ones

Expected behavior

It will take a few minutes for the image to pull the first time. After that, if all goes well, a window like the one below will soon pop up. Your agent, which is just pressing the up arrow repeatedly, is now playing a Flash racing game called Dusk Drive. Your agent is programmatically controlling a VNC client, connected to a VNC server running inside of a Docker container in the cloud, rendering a headless Chrome with Flash enabled:

Actual behavior

i followed the getting started tutorial but i get the following error when i try to launch the python agent code from both host and docker container

[2017-05-06 13:48:43,624] Making new env: flashgames.DuskDrive-v0
[2017-05-06 13:48:43,630] Writing logs to file: /tmp/universe-1.log
[2017-05-06 13:48:43,679] Ports used: dict_keys([])
[2017-05-06 13:48:43,679] [0] Creating container: image=quay.io/openai/universe.flashgames:0.20.28. Run the same thing by hand as: docker run -p 5900:5900 -p 15900:15900 --privileged --ipc host --cap-add SYS_ADMIN quay.io/openai/universe.flashgames:0.20.28
[2017-05-06 13:48:46,022] Remote closed: address=172.17.0.1:15900
[2017-05-06 13:48:46,023] Remote closed: address=172.17.0.1:5900
[2017-05-06 13:48:46,024] At least one sockets was closed by the remote. Sleeping 1s...
�[36muniverse-FKd51W-0 |�[0m Setting VNC and rewarder password: openai
�[36muniverse-FKd51W-0 |�[0m [Sat May  6 13:48:46 UTC 2017] Waiting for /tmp/.X11-unix/X0 to be created (try 1/10)
[2017-05-06 13:48:47,025] Remote closed: address=172.17.0.1:15900
[2017-05-06 13:48:47,026] Remote closed: address=172.17.0.1:5900
[2017-05-06 13:48:47,026] At least one sockets was closed by the remote. Sleeping 1s...
�[36muniverse-FKd51W-0 |�[0m [tigervnc] 
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Xvnc TigerVNC 1.7.0 - built Sep  8 2016 10:39:22
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Copyright (C) 1999-2016 TigerVNC Team and many others (see README.txt)
�[36muniverse-FKd51W-0 |�[0m [tigervnc] See http://www.tigervnc.org for information on TigerVNC.
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Underlying X server release 11400000, The X.Org Foundation
�[36muniverse-FKd51W-0 |�[0m [tigervnc] 
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension VNC-EXTENSION
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension Generic Event Extension
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension SHAPE
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension MIT-SHM
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension XInputExtension
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension XTEST
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension BIG-REQUESTS
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension SYNC
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension XKEYBOARD
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension XC-MISC
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension XINERAMA
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension XFIXES
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension RENDER
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension RANDR
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension COMPOSITE
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension DAMAGE
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension MIT-SCREEN-SAVER
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension DOUBLE-BUFFER
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension RECORD
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension DPMS
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension X-Resource
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension XVideo
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension XVideo-MotionCompensation
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Initializing built-in extension GLX
�[36muniverse-FKd51W-0 |�[0m [tigervnc] 
�[36muniverse-FKd51W-0 |�[0m [tigervnc] Sat May  6 13:48:47 2017
�[36muniverse-FKd51W-0 |�[0m [tigervnc]  vncext:      VNC extension running!
�[36muniverse-FKd51W-0 |�[0m [tigervnc]  vncext:      Listening for VNC connections on all interface(s), port 5900
�[36muniverse-FKd51W-0 |�[0m [tigervnc]  vncext:      created VNC server for screen 0
�[36muniverse-FKd51W-0 |�[0m [tigervnc] [dix] Could not init font path element /usr/share/fonts/X11/Type1/, removing from list!
�[36muniverse-FKd51W-0 |�[0m [tigervnc] [dix] Could not init font path element /usr/share/fonts/X11/75dpi/, removing from list!
�[36muniverse-FKd51W-0 |�[0m [tigervnc] [dix] Could not init font path element /usr/share/fonts/X11/100dpi/, removing from list!
�[36muniverse-FKd51W-0 |�[0m iptables v1.6.0: host/network `run' not found
�[36muniverse-FKd51W-0 |�[0m Try `iptables -h' or 'iptables --help' for more information.
�[36muniverse-FKd51W-0 |�[0m Traceback (most recent call last):
�[36muniverse-FKd51W-0 |�[0m   File "/app/universe-envs/flashgames/init", line 269, in <module>
�[36muniverse-FKd51W-0 |�[0m     sys.exit(main())
�[36muniverse-FKd51W-0 |�[0m   File "/app/universe-envs/flashgames/init", line 247, in main
�[36muniverse-FKd51W-0 |�[0m     basic_setup()
�[36muniverse-FKd51W-0 |�[0m   File "/app/universe-envs/flashgames/init", line 148, in basic_setup
�[36muniverse-FKd51W-0 |�[0m     sudoable_env_setup()
�[36muniverse-FKd51W-0 |�[0m   File "/app/universe-envs/flashgames/init", line 113, in sudoable_env_setup
�[36muniverse-FKd51W-0 |�[0m     subprocess.check_call(['sudo', '-u', 'nobody', 'sudo', '/usr/local/bin/sudoable-env-setup'])
�[36muniverse-FKd51W-0 |�[0m   File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
�[36muniverse-FKd51W-0 |�[0m     raise CalledProcessError(retcode, cmd)
�[36muniverse-FKd51W-0 |�[0m subprocess.CalledProcessError: Command '['sudo', '-u', 'nobody', 'sudo', '/usr/local/bin/sudoable-env-setup']' returned non-zero exit status 2
[2017-05-06 13:48:48,027] Remote closed: address=172.17.0.1:5900
[2017-05-06 13:48:48,028] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:48:49,030] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:48:50,031] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:48:51,033] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:48:52,035] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:48:53,037] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:48:54,039] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:48:55,041] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:48:56,043] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:48:57,045] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:48:58,047] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:48:59,049] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:00,051] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:01,052] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:02,054] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:03,056] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:04,058] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:05,060] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:06,062] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:07,064] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:08,066] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:09,068] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:10,070] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:11,072] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:12,074] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:13,077] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:14,079] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:15,081] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:16,083] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
[2017-05-06 13:49:17,085] VNC server 172.17.0.1:5900 did not come up yet (error: [Errno 111] Connection refused). Sleeping for 1s.
Traceback (most recent call last):
  File "/agents/first-agent.py", line 17, in <module>
    main()
  File "/agents/first-agent.py", line 8, in main
    env.configure(remotes=1)  # automatically creates a local docker container
  File "/usr/local/universe/universe/wrappers/timer.py", line 14, in configure
    self.env.configure(**kwargs)
  File "/usr/local/universe/universe/wrappers/render.py", line 21, in configure
    self.env.configure(**kwargs)
  File "/usr/local/universe/universe/wrappers/throttle.py", line 32, in configure
    self.env.configure(**kwargs)
  File "/usr/local/universe/universe/envs/vnc_env.py", line 199, in configure
    use_recorder_ports=record,
  File "/usr/local/universe/universe/remotes/build.py", line 19, in build
    n=n,
  File "/usr/local/universe/universe/remotes/docker_remote.py", line 55, in __init__
    self._start()
  File "/usr/local/universe/universe/remotes/docker_remote.py", line 84, in _start
    self.healthcheck(self.instances)
  File "/usr/local/universe/universe/remotes/docker_remote.py", line 109, in healthcheck
    start_timeout=30,
  File "/usr/local/universe/universe/remotes/healthcheck.py", line 14, in run
    healthcheck.run()
  File "/usr/local/universe/universe/remotes/healthcheck.py", line 131, in run
    self._register_vnc(address)
  File "/usr/local/universe/universe/remotes/healthcheck.py", line 63, in _register_vnc
    raise error.Error('VNC server {} did not come up within {}s'.format(address, self.start_timeout))
universe.error.Error: VNC server 172.17.0.1:5900 did not come up within 30s
[2017-05-06 13:49:18,088] Killing and removing container: id=13f3c77781f1eb208cc242af14cd020e7dabff4336bc5549bdfbbeace8783d7b

launching the tests gives a similar problem

Versions

Please include the result of running

$ uname -a ; python --version; pip show universe gym tensorflow numpy go-vncdriver Pillow

result:

Linux deathcode-N56JK 4.10.0-20-generic #22-Ubuntu SMP Thu Apr 20 09:22:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
---
Name: universe
Version: 0.21.5
Summary: Universe: a software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications.
Home-page: https://github.com/openai/universe
Author: OpenAI
Author-email: universe@openai.com
License: UNKNOWN
Location: /home/deathcode/devel/python/openai/universe
Requires: autobahn, docker-py, docker-pycreds, fastzbarlight, go-vncdriver, gym, Pillow, PyYAML, six, twisted, ujson
---
Name: gym
Version: 0.8.1
Summary: The OpenAI Gym: A toolkit for developing and comparing your reinforcement learning agents.
Home-page: https://github.com/openai/gym
Author: OpenAI
Author-email: gym@openai.com
License: UNKNOWN
Location: /home/deathcode/devtools/miniconda3/envs/openai/lib/python3.5/site-packages
Requires: pyglet, requests, numpy, six
---
Name: numpy
Version: 1.12.1
Summary: NumPy: array processing for numbers, strings, records, and objects.
Home-page: http://www.numpy.org
Author: NumPy Developers
Author-email: numpy-discussion@scipy.org
License: BSD
Location: /home/deathcode/devtools/miniconda3/envs/openai/lib/python3.5/site-packages
Requires: 
---
Name: go-vncdriver
Version: 0.4.19
Summary: UNKNOWN
Home-page: UNKNOWN
Author: UNKNOWN
Author-email: UNKNOWN
License: UNKNOWN
Location: /home/deathcode/devtools/miniconda3/envs/openai/lib/python3.5/site-packages
Requires: numpy
---
Name: Pillow
Version: 4.1.1
Summary: Python Imaging Library (Fork)
Home-page: https://python-pillow.org
Author: Alex Clark (Fork Author)
Author-email: aclark@aclark.net
License: Standard PIL License
Location: /home/deathcode/devtools/miniconda3/envs/openai/lib/python3.5/site-packages
Requires: olefile
@tlbtlbtlb
Copy link
Contributor

universe-FKd51W-0 | iptables v1.6.0: host/network `run' not found

Is your host called 'run'? Otherwise, something strange is happening.

Could you try running the container in diagnostics mode, eg:

docker run -p 5900:5900 -p 15900:15900 --privileged --ipc host --cap-add SYS_ADMIN quay.io/openai/universe.flashgames:0.20.28 diagnostics

and report the result?

@deathcoder
Copy link
Author

this is what i get if i launch that

docker run -p 5900:5900 -p 15900:15900 --privileged --ipc host --cap-add SYS_ADMIN quay.io/openai/universe.flashgames:0.20.28 diagnostics

kind of the same result

Setting VNC and rewarder password: openai
iptables v1.6.0: host/network `run' not found
Try `iptables -h' or 'iptables --help' for more information.
Traceback (most recent call last):
  File "/app/universe-envs/flashgames/init", line 269, in <module>
    sys.exit(main())
  File "/app/universe-envs/flashgames/init", line 263, in main
    subprocess.check_call(['sudo', '-u', 'nobody', 'sudo', '/usr/local/bin/sudoable-env-setup', 'git-lfs', 
'diagnostics'])
  File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-u', 'nobody', 'sudo', '/usr/local/bin/sudoable-env-setup', 'git-lfs', 'diagnostics']' returned non-zero exit status 2

@tlbtlbtlb
Copy link
Contributor

There's something wrong with the way your Docker engine is trying to access the network. It seems to believe your DNS server is called run, which is producing an error as it's opening the internal firewall port to it. (The container creates an internal firewall, so malware in the games it downloads can't get onto your internal network.)

Could you try this and report the results:

docker run -ti -p 5900:5900 -p 15900:15900 --privileged --ipc host --cap-add SYS_ADMIN quay.io/openai/universe.flashgames:0.20.28 shell

which should give you a shell prompt inside the container. Then

sh -x /usr/local/bin/sudoable-env-setup git-lfs flashgames.Zombonarium-v0

which will print more debugging info as it runs the iptables commands. That should show where it's getting the nameserver from. Thanks for helping track this down!

@deathcoder
Copy link
Author

deathcoder commented May 10, 2017

i had to use this command in order to get a shell inside the container

docker run -ti --entrypoint /bin/bash -p 5900:5900 -p 15900:15900 --privileged --ipc host --cap-add SYS_ADMIN quay.io/openai/universe.flashgames:0.20.28

and the result is this: (by looking at the output i have no idea about what is going wrong)

root@304c2d9a25fd:/app# sh -x /usr/local/bin/sudoable-env-setup git-lfs flashgames.Zombonarium-v0
+ set -eu
+ env=git-lfs
+ port=5900
+ [ -z  ]
+ [ -f /usr/local/openai/privileged_flags/SECURITY_HOLE_ALLOW_INTERNAL_TRAFFIC ]
+ [  = true ]
+ iptables -P OUTPUT DROP
+ iptables -F
+ [ -f /usr/local/openai/privileged_flags/ALLOWED_OUTBOUND ]
+ xvnc=127.0.0.1
+ iptables -A OUTPUT -p tcp -m tcp --dport 5900 --dst 127.0.0.1 -j ACCEPT
+ iptables -A OUTPUT -p tcp -m tcp --dst 127.0.0.1 -j ACCEPT
+ iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
+ grep nameserver /etc/resolv.conf
+ cut -f 2 -d  
+ iptables -A OUTPUT -p udp -m udp --dport 53 --dst run -j ACCEPT
iptables v1.6.0: host/network `run' not found
Try `iptables -h' or 'iptables --help' for more information.

@tlbtlbtlb
Copy link
Contributor

Our script is interpreting your /etc/resolv.conf to have the nameserver set to run. It should be an IP address like 192.168.1.1. We just parse it with grep -- my guess is that our script is seeing a comment like:

# run blah to restart the nameserver

I'll write a more robust parser. Can you post the /etc/resolv.conf, from inside the container so I can test to make sure it'll work? You may be able to work around it in the meantime by editing /etc/resolv.conf on your host to remove any comment lines with nameserver in them.

@deathcoder
Copy link
Author

deathcoder commented May 11, 2017

you are right this is the content of /etc/resolv.conf

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.

nameserver 8.8.8.8
nameserver 8.8.4.4

edit:
is the dockerfile for the universe.flashgames's base image: "universe.base" public?
i wanted to try a fix myself but i'm stuck there, i would love to contribute if possible
i wonder if i could just mount the patched file

@KenobySky
Copy link

Im having the same issue.
I tried installing OpenAI-Universe at :

  • Lubuntu
  • Lubuntu at Virtual box
  • Ubuntu at Virtual box

In all 3 cases, it failed. Same error : iptables v1.6.0: host/network `run' not found

On my nameserver, i had :

nameserver 127.0.0.53

I commented it, and same error. Changed to google DNS, same error.

I eventually gave up. Im wondering if OpenAI Universe works only with MAC... This issue is opened since may and nothing?

Just wondering and a bit frustrated.
Is there anything i can do to make it work ?

@deathcoder
Copy link
Author

Hi, when i fixed it i created a patched version of the script sudoable-env-setup, and then i built a custom image with the same tag used by universe...

i have now uploaded everything on github on this repository

running build.sh will create the patched docker image
after that everything should work :)

@KenobySky
Copy link

Hi! Thanks but I dont understand how i should run this.

I ran "sudo ./build.sh" but what now? Im not that familiar with docker...

@deathcoder
Copy link
Author

after you run the build.sh you just run again your python agent and everything should work,
explanation in case you are interested:
when the python agent is ready to run the environment what it asks docker to load up a preconfigured 'vm' with the name quay.io/openai/universe.flashgames and version: 0.20.28
docker than proceeds to download the 'vm'( unless you already have it locally, this is key in why this works ) and starts it.
my script overrides your local image with a patched version, this means that whoever tries to use that image will be using the patched version without even knowing it
you may only run into problems if the image's version used by universe changes, in that case this will should still work and is as easy as replacing the version '0.20.28' in both Dockerfile and the build.sh script with the new version
hope this helps

@gdahlm
Copy link

gdahlm commented Jul 20, 2017

Note that this is not a resolver issue, the issue is that the script /usr/local/bin/sudoable-env-setup is failing and leaving iptables in a broken state.

Here is the iptables rules that are breaking DNS lookups.

# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy DROP)
target     prot opt source               destination         
ACCEPT     tcp  --  anywhere             localhost            tcp dpt:5900
ACCEPT     tcp  --  anywhere             localhost            tcp
ACCEPT     all  --  anywhere             anywhere             state RELATED,ESTABLISHED

Here you can see that DNS is blocked but when I flush the rules it works again.

# nslookup
> www.google.com
../../../../lib/isc/unix/net.c:581: sendmsg() failed: Operation not permitted
^C
root@4ffbf92be792:~# view /usr/local/bin/sudoable-env-setup 
root@4ffbf92be792:~# iptables -P OUTPUT ACCEPT
root@4ffbf92be792:~# curl https://www.google.com
<!doctype html><html itemscope="" it...<SNIP>

Note that the set -eu at the top of the script will make errors difficult to detect but it is one of the various scripts that it runs that is returning a non-zero status and the script is exiting.

I should note that lots of the complexity in this script could be removed, I also want to point out this block of comments in it.

# compromised, so that the user can run arbitrary code as the "nobody"
# user, and thus run this script with arbitrary arguments.
#
# We make sure *never* to open the EC2 metadata IP or internal IPs,
# except where a hole has been specifically requested (this is needed
# when envs share Kube pods with workers.)

As the container is being launched with the --privileged flag all of the protections of SElinux or apparmor are removed and there is no need to try complicated network tricks. Under the current model there is zero security segmentation and the user/process can currently run arbitrary code as the root user.

With "--privileged" there really zero security separation with docker containers and every process is effectively a root user.

As a very simple POC, here I use lsblk to see what block device is being bind mounted from the container host, then I use mknod to create a device file and use dd to read the contents.

root@46f13a2cd214:/app# lsblk
lsblk: dm-1: failed to get device path
lsblk: dm-2: failed to get device path
lsblk: dm-0: failed to get device path
lsblk: dm-1: failed to get device path
lsblk: dm-0: failed to get device path
lsblk: dm-3: failed to get device path
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
nvme1n1     259:0    0 953.9G  0 disk 
|-nvme1n1p5 259:5    0   300G  0 part /etc/hosts
|-nvme1n1p3 259:3    0 652.6G  0 part 
|-nvme1n1p1 259:1    0   260M  0 part 
|-nvme1n1p4 259:4    0  1000M  0 part 
`-nvme1n1p2 259:2    0    16M  0 part 
nvme0n1     259:6    0   477G  0 disk 
|-nvme0n1p1 259:7    0  93.1G  0 part 
`-nvme0n1p2 259:8      383.8G  0 part 
root@46f13a2cd214:/app# cd /
root@46f13a2cd214:/# mknod hostroot b 259 0
root@46f13a2cd214:/# ls -l hostroot 
brw-r--r-- 1 root root 259, 0 Jul 20 22:20 hostroot
root@46f13a2cd214:/# dd if=hostroot of=foo.img bs=1M count=256
256+0 records in
256+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 0.18781 s, 1.4 GB/s
root@46f13a2cd214:/# file foo.img 
foo.img: DOS/MBR boot sector; partition 1 : ID=0xee, start-CHS (0x0,0,1), end-CHS (0x3ff,254,63), startsector 1, 2000409263 sectors, extended partition table (last)
root@46f13a2cd214:/# 

You can also load kernel modules or anything that the superuser can on the docker host as an example:

Docker host:

32768

Docker container:

root@46f13a2cd214:/# cat /proc/sys/kernel/pid_max
32768
root@46f13a2cd214:/# echo "655376" > /proc/sys/kernel/pid_max
root@46f13a2cd214:/# cat /proc/sys/kernel/pid_max
655376
root@46f13a2cd214:/# 

Changes reflected on the parent host:

$ cat /proc/sys/kernel/pid_max
655376

Anyway, this is a known limitation, I don't want to share any viable exploits, and the docker project knows and has decided to mark bugs related to this as 'won't fix' But the container can do anything from creating new network interfaces to installing a new bios with the current command line.

Users would be far safer if the methods used are refactored to avoid privilege escalation and IMHO it would far more stable for the users too.

I know that --privileged is used a lot, but it would help avoid a serious wanacry style issue to find other ways of controlling network access.

@taylerallen6
Copy link

I am still having the same issue as tlbtlbtlb described. Has anyone found a solution yet?

@YaguangZhang
Copy link

I had the exact same issue and tried manually deleting the comments in /etc/resolv.conf. And it seems to work after that.

@taylerallen6
Copy link

How do I get to /etc/resolv.conf?

@taylerallen6
Copy link

Nevermind. I just typed in sudo nano /etc/resolv.conf. I deleted all the comments. (literally, I removed any line that started with '#') I saved it and everything worked!

@sudharsan13296
Copy link

@deathcoder OMG your patch worked perfectly. Thank you so much. Struggled with that for past few days. Thanks a lot.

@gdb gdb closed this as completed Apr 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants