-
Notifications
You must be signed in to change notification settings - Fork 951
Solutions to common problems
When python sees import gym
, it searches a list of places (including your current directory) for a module called gym. If you've called your test program gym.py
or universe.py
, you're going to have a bad time. Name them something else, and be sure to delete any gym.pyc
or universe.pyc
files that Python has cached.
You can verify that Python is finding the right things with
$ python
>>> import gym
gym
>>> gym.__file__
'/Users/tlb/openai/gym/gym/__init__.py'
If you get something like this:
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
then it's not able to create an OpenGL window to show you the environment running.
- On OSX, make sure you have XQuartz installed.
- On Ubuntu, see here: http://askubuntu.com/questions/541343/problems-with-libgl-fbconfigs-swrast-through-each-update/566522#566522
The container doesn't contain all the games (which together are huge), it downloads them as needed using git-lfs. So the container needs to connect to github.com on ports 22 and 443. If you see something like:
[Sat Dec 31 19:17:00 UTC 2016] [/usr/local/bin/sudoable-env-setup] Allowing outbound network traffic to non-private IPs for git-lfs. (Going to fetch files via git lfs.)
[unpack-lfs] [2016-12-31 19:17:01,081] Fetching files: git lfs pull -I git-lfs/flashgames.DuskDrive-v0.tar.gz
[unpack-lfs] [2016-12-31 19:19:38,009] Finished running git lfs pull
[unpack-lfs] [2016-12-31 19:19:38,009] git lfs pull failed; detected from output: stdout=b'\rGit LFS: (0 of 1 files) 0 B / 9.52 MB \n' stderr=b'batch request: exit status 255: ssh: connect to host github.com port 22: Connection refused\n'
[unpack-lfs] [2016-12-31 19:19:38,076] unpack failed
that means there's a network problem. Things to check:
-
Are you behind a firewall? Ask your admin how to get external access. It needs to connect to github.com:22 and github.com:443.
-
Run the container in diagnostics mode, which will try network operations and log everything to the console. To do this, run
$ docker network inspect bridge; docker run --rm --privileged --ipc host --cap-add SYS_ADMIN quay.io/openai/universe.flashgames:latest diagnostics
That will run various network operations and log them to the console. If you're reporting a problem, please cut and paste the entire output into your github issue.
If it reports a connection denied or timeout, your container can't get to the public internet.
- Docker has a wide range of network options. If you're running a standalone docker system (not part of a clusted like Kubernetes) you want bridge mode. Read all about it at Docker container networking
-
Docker can lose its network configuration when your computer switches networks. See I can't reach Docker Hub from my home network
-
Docker can fail to download a layer. For example,
[2017-01-07 20:34:35,146] Image quay.io/openai/universe.flashgames:0.20.21 not present locally; pulling
0.20.21: Pulling from openai/universe.flashgames
aed15891ba52: Pull complete
773ae8583d14: Pull complete
...
universe.remotes.compose.progress_stream.StreamOutputError: failed to register layer: Error processing tar file(gzip: invalid checksum):
This can happen when Docker's network connection is interrupted while downloading the remote image. Try again, or download the image manually with docker pull quay.io/openai/universe.flashgames:0.20.21
(replace the version with the version Universe was trying to download.
- If you're in China, the GFW may block access to quay.io. Daocloud has a mirror inside China. Sign up for it at https://www.daocloud.io/mirror#accelerator-doc. It'll give you a mirror ID, and you can configure your docker to use it by running (replacing MIRRORID with your assigned id)
$ curl -sSL https://get.daocloud.io/daotools/set_mirror.sh | sh -s http://MIRRORID.m.daocloud.io
Then, tell universe to pull from docker.io, which daocloud will mirror within the GFW:
$ export OPENAI_DOCKER_REPO=docker.io/openai
If you're using pyenv and get errors like:
/home/user/.pyenv/versions/3.5.2/lib/libpython3.5m.a(floatobject.o): In function `float_is_integer':
/tmp/python-build.20161207101855.17159/Python-3.5.2/Objects/floatobject.c:812: undefined reference to `floor'
you need to rebuild pyenv to support shared libraries. See solution
Universe environments need a lot of CPU power to run. Most environments need 2 cores of a modern Intel CPU, to run the Flash engine, browser, renderer, X11 server, VNC server, and the 'vexpect' logic to start games by detecting visual elements on the screen and clicking buttons. If you see many of these:
universe-98PmJS-0 | [2017-03-02 06:16:44,043] [play_vexpect] Fell behind by 0.4378662109375s from target; losing 26 frames
your computer is too slow or too heavily loaded.
If you're using AWS EC2 instances, an c4.xlarge
is a good choice for a single worker and environment, and a c4.4xlarge
is a good choice for 4 agent workers + 4 environments. The t2.*
instances are not suitable.