Project import generated by Copybara.

PiperOrigin-RevId: 267556712
google-research · Sep 6, 2019 · 625c2b3 · 625c2b3
1 parent 70638d8
commit 625c2b3
Show file tree

Hide file tree

Showing 12 changed files with 275 additions and 38 deletions.
diff --git a/CHANGELOG b/CHANGELOG
@@ -5,6 +5,11 @@ should not change, as modifications made to the environment are either
 new features or backward compatible bug fixes. We will maintain vX branches
 pointing at the most recent vX.Y.
 
+v1.4
+- Added implementation of architecture 'gfootball_impala_cnn' used in the paper.
+- Added possibility of loading PPO checkpoints as players, added example checkpoints (for levels 11_vs_11_easy_stochastic and academy_run_to_score_with_keeper).
+- Removed TensorFlow dependency when running only the environment (but for training OpenAI baselines it's still needed)
+
 v1.3
 - Fix to pixel representation (https://github.com/google-research/football/issues/54,56,57).
 

diff --git a/Dockerfile b/Dockerfile
@@ -6,7 +6,7 @@ ENV DEBIAN_FRONTEND=noninteractive
 RUN apt-get update && apt-get install -yq git cmake build-essential \
   libgl1-mesa-dev libsdl2-dev \
   libsdl2-image-dev libsdl2-ttf-dev libsdl2-gfx-dev libboost-all-dev \
-  libdirectfb-dev libst-dev mesa-utils xvfb x11vnc libsqlite3-dev \
+  libdirectfb-dev libst-dev mesa-utils xvfb x11vnc \
   glee-dev libsdl-sge-dev python3-pip
 
 COPY . /gfootball

diff --git a/README.md b/README.md
@@ -18,32 +18,37 @@ You can either install the code from github (newest version) or from pypi (stabl
   1. Install required apt packages with
   `sudo apt-get install git cmake build-essential libgl1-mesa-dev libsdl2-dev
   libsdl2-image-dev libsdl2-ttf-dev libsdl2-gfx-dev libboost-all-dev
-  libdirectfb-dev libst-dev mesa-utils xvfb x11vnc libsqlite3-dev
-  glee-dev libsdl-sge-dev python3-pip`
+  libdirectfb-dev libst-dev mesa-utils xvfb x11vnc glee-dev libsdl-sge-dev
+  python3-pip`
 
   1. Install gfootball python package from pypi:
 
-    - Use `pip3 install gfootball[tf_cpu]` if you want to use CPU version of TensorFlow.
-    - Use `pip3 install gfootball[tf_gpu]` if you want to use GPU version of TensorFlow.
+    - Use `pip3 install gfootball`
     - This command can run for couple of minutes, as it compiles the C++ environment in the background.
 
   OR install gfootball python package (run the commands from the main project directory):
 
     - `git clone https://github.com/google-research/football.git`
     - `cd football`
-    - Use `pip3 install .[tf_cpu]` if you want to use the CPU version of TensorFlow.
-    - Use `pip3 install .[tf_gpu]` if you want to use GPU version of TensorFlow.
+    - `pip3 install .`
     - This command can run for a couple of minutes, as it compiles the C++ environment in the background.
 
 ## Running experiments
-First, install newest OpenAI Baselines:
-`pip3 install git+https://github.com/openai/baselines.git@master`.
+Install additional dependencies:
+
+- TensorFlow: `pip3 install "tensorflow<2.0"` or
+  `pip3 install "tensorflow-gpu<2.0"`, depending on whether you want CPU or
+  GPU version;
+- Sonnet: `pip3 install dm-sonnet`;
+- OpenAI Baselines:
+  `pip3 install git+https://github.com/openai/baselines.git@master`.
 
 Then:
+
 - To run example PPO experiment on `academy_empty_goal` scenario, run
-`python3 -m gfootball.examples.run_ppo2 --level=academy_empty_goal_close`
+  `python3 -m gfootball.examples.run_ppo2 --level=academy_empty_goal_close`
 - To run on `academy_pass_and_shoot_with_keeper` scenario, run
-`python3 -m gfootball.examples.run_ppo2 --level=academy_pass_and_shoot_with_keeper`
+  `python3 -m gfootball.examples.run_ppo2 --level=academy_pass_and_shoot_with_keeper`
 
 In order to train with nice replays being saved, run
 `python3 -m gfootball.examples.run_ppo2 --dump_full_episodes=True --render=True`
@@ -54,12 +59,26 @@ base scenario and the left player is controlled by the keyboard. Different types
 of players are supported (gamepad, external bots, agents...). For possible
 options run `python3 -m gfootball.play_game -helpfull`.
 
+In particular, one can play against agent trained with `run_ppo2` script with
+the following command:
+`python3 -m gfootball.play_game --players "keyboard:left_players=1;ppo2_cnn:right_players=1,checkpoint=$YOUR_PATH"`
+
 Please note that playing
 the game is implemented through an environment, so human-controlled players use
 the same interface as the agents. One important fact is that there is a single
 action per 100 ms reported to the environment, which might cause a lag effect
 when playing.
 
+## Trained checkpoints
+We provide trained PPO checkpoints for the following scenarios:
+
+  - [11_vs_11_easy_stochastic](https://storage.googleapis.com/gfootball-public-bucket/trained_model_11_vs_11_easy_stochastic),
+  - [academy_run_to_score_with_keeper](https://storage.googleapis.com/gfootball-public-bucket/trained_model_academy_run_to_score_with_keeper).
+
+In order to see the checkpoints playing, run
+`python3 -m gfootball.play_game --players "ppo2_cnn:left_players=1,policy=gfootball_impala_cnn,checkpoint=$CHECKPOINT" --level=$LEVEL`,
+where `$CHECKPOINT` is the path to downloaded checkpoint.
+
 ### Keyboard mapping
 The game defines following keyboard mapping (for the `keyboard` player type):
 
@@ -238,7 +257,7 @@ A simple example of training multi-agent can be found in examples/run_multiagent
 
 ### GPU version
 1. Build with `docker build --build-arg DOCKER_BASE=tensorflow/tensorflow:1.12.0-gpu-py3 --build-arg DEVICE=gpu . -t gfootball`
-1. Enter the image with `nvidia-docker run -it gfootball bash`
+1. Enter the image with `nvidia-docker run -it gfootball bash` or `docker run --gpus all -it gfootball bash` for docker 19.03 or later.
 
 After entering the image, you can run sample training with `python3 -m gfootball.examples.run_ppo2`.
 Unfortunately, rendering is not supported inside the docker.

diff --git a/gfootball/env/__init__.py b/gfootball/env/__init__.py
@@ -150,8 +150,6 @@ def create_environment(env_name='',
     env = wrappers.SingleAgentObservationWrapper(env)
     env = wrappers.SingleAgentRewardWrapper(env)
   if stacked:
-    # Import FrameStack here to avoid unconditional dependence on baselines.
-    from baselines.common.atari_wrappers import FrameStack
-    env = FrameStack(env, 4)
+    env = wrappers.FrameStack(env, 4)
 
   return env
diff --git a/gfootball/env/football_env.py b/gfootball/env/football_env.py
@@ -129,7 +129,8 @@ def _convert_observations(self, original, player,
 
   def _get_actions(self):
     obs = self._env.observation()
-    actions = []
+    left_actions = []
+    right_actions = []
     left_player_position = 0
     right_player_position = 0
     for player in self._players:
@@ -146,7 +147,9 @@ def _get_actions(self):
       assert len(adopted_obs) == len(
           a), 'Player returned {} actions instead of {}.'.format(
               len(a), len(adopted_obs))
-      actions.extend(a)
+      left_actions.extend(a[:player.num_controlled_left_players()])
+      right_actions.extend(a[player.num_controlled_left_players():])
+    actions = left_actions + right_actions
     return actions
 
   def step(self, action):

diff --git a/gfootball/env/observation_processor.py b/gfootball/env/observation_processor.py
@@ -23,6 +23,7 @@
 import datetime
 import logging
 import os
+import shutil
 import tempfile
 import timeit
 import traceback
@@ -34,10 +35,11 @@
 from six.moves import range
 from six.moves import zip
 import six.moves.cPickle
-import tensorflow as tf
 
 REMOVED_FRAME = 'removed'
 
+WRITE_FILES = True
+
 try:
   import cv2
 except ImportError:
@@ -208,7 +210,8 @@ def write_dump(name, trace, skip_visuals=False, config={}):
     os.close(fd)
     try:
       # For some reason sometimes the file is missing, so the code fails.
-      tf.io.gfile.copy(temp_path, name + '.avi', overwrite=True)
+      if WRITE_FILES:
+        shutil.copy2(temp_path, name + '.avi')
       os.remove(temp_path)
     except:
       logging.info(traceback.format_exc())
@@ -219,8 +222,9 @@ def write_dump(name, trace, skip_visuals=False, config={}):
       temp_frames.append(o._trace['observation']['frame'])
       o._trace['observation']['frame'] = REMOVED_FRAME
     to_pickle.append(o._trace)
-  with tf.io.gfile.GFile(name + '.dump', 'wb') as f:
-    six.moves.cPickle.dump(to_pickle, f)
+  if WRITE_FILES:
+    with open(name + '.dump', 'wb') as f:
+      six.moves.cPickle.dump(to_pickle, f)
   for o in trace:
     if 'frame' in o._trace['observation']:
       o._trace['observation']['frame'] = temp_frames.pop(0)
@@ -356,7 +360,9 @@ def write_dump(self, name):
     config._last_dump = timeit.default_timer()
     if self._dump_directory is None:
       self._dump_directory = self._config['tracesdir']
-      tf.io.gfile.makedirs(self._dump_directory)
+      if WRITE_FILES:
+        if not os.path.exists(self._dump_directory):
+          os.makedirs(self._dump_directory)
     config._file_name = '{2}/{0}_{1}'.format(
         name,
         datetime.datetime.now().strftime('%Y%m%d-%H%M%S%f'),
@@ -380,4 +386,4 @@ def process_pending_dumps(self, finish):
         assert not config._file_name
         if config._result.ready() or finish:
           config._result.get()
-          config._result = None
+          config._result = None
diff --git a/gfootball/env/players/ppo2_cnn.py b/gfootball/env/players/ppo2_cnn.py
@@ -0,0 +1,114 @@
+# coding=utf-8
+# Copyright 2019 Google LLC
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+r"""Player from PPO2 cnn checkpoint.
+
+Example usage with play_game script:
+python3 -m gfootball.play_game \
+    --players "ppo2_cnn:left_players=1,checkpoint=$YOUR_PATH,policy=$POLICY"
+
+$POLICY should be one of: cnn, impala_cnn, gfootball_impala_cnn.
+"""
+
+from baselines.common.policies import build_policy
+from gfootball.env import football_action_set
+from gfootball.env import observation_preprocessing
+from gfootball.env import player_base
+from gfootball.examples import models  
+import gym
+import joblib
+import numpy as np
+import tensorflow as tf
+
+
+class Player(player_base.PlayerBase):
+  """An agent loaded from PPO2 cnn model checkpoint."""
+
+  def __init__(self, player_config, env_config):
+    player_base.PlayerBase.__init__(self, player_config)
+
+    self._action_set = 'default'
+    self._sess = tf.Session()
+    self._player_prefix = 'player_{}'.format(player_config['index'])
+    stacking = 4 if player_config.get('stacked', True) else 1
+    policy = player_config.get('policy', 'cnn')
+    self._stacker = ObservationStacker(stacking)
+    with tf.variable_scope(self._player_prefix):
+      with tf.variable_scope('ppo2_model'):
+        policy_fn = build_policy(DummyEnv(self._action_set, stacking), policy)
+        self._policy = policy_fn(nbatch=1, sess=self._sess)
+    _load_variables(player_config['checkpoint'], self._sess,
+                    prefix=self._player_prefix + '/')
+
+  def __del__(self):
+    self._sess.close()
+
+  def take_action(self, observation):
+    assert len(observation) == 1, 'Multiple players control is not supported'
+
+    observation = observation_preprocessing.generate_smm(observation)
+    observation = self._stacker.get(observation)
+    action = self._policy.step(observation)[0][0]
+    actions = [football_action_set.action_set_dict[self._action_set][action]]
+    return actions
+
+  def reset(self):
+    self._stacker.reset()
+
+
+def _load_variables(load_path, sess, prefix='', remove_prefix=True):
+  """Loads variables from checkpoint of policy trained by baselines."""
+
+  # Forked from address below since we needed loading from different var names:
+  # https://github.com/openai/baselines/blob/master/baselines/common/tf_util.py
+  variables = [v for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
+               if v.name.startswith(prefix)]
+
+  loaded_params = joblib.load(load_path)
+  restores = []
+  for v in variables:
+    v_name = v.name[len(prefix):] if remove_prefix else v.name
+    restores.append(v.assign(loaded_params[v_name]))
+
+  sess.run(restores)
+
+
+class ObservationStacker(object):
+  """Utility class that produces stacked observations."""
+
+  def __init__(self, stacking):
+    self._stacking = stacking
+    self._data = []
+
+  def get(self, observation):
+    if self._data:
+      self._data.append(observation)
+      self._data = self._data[-self._stacking:]
+    else:
+      self._data = [observation] * self._stacking
+    return np.concatenate(self._data, axis=-1)
+
+  def reset(self):
+    self._data = []
+
+
+class DummyEnv(object):
+  # We need env object to pass to build_policy, however real environment
+  # is not there yet.
+
+  def __init__(self, action_set, stacking):
+    self.action_space = gym.spaces.Discrete(
+        len(football_action_set.action_set_dict[action_set]))
+    self.observation_space = gym.spaces.Box(
+        0, 255, shape=[72, 96, 4 * stacking], dtype=np.uint8)
diff --git a/gfootball/env/wrappers.py b/gfootball/env/wrappers.py
@@ -19,11 +19,11 @@
 from __future__ import division
 from __future__ import print_function
 
+import collections
+import cv2
 from gfootball.env import observation_preprocessing
-import gfootball_engine as libgame
 import gym
 import numpy as np
-import cv2
 
 
 class PeriodicDumpWriter(gym.Wrapper):
@@ -153,10 +153,10 @@ def __init__(self, env,
                                    observation_preprocessing.SMM_HEIGHT)):
     gym.ObservationWrapper.__init__(self, env)
     self._channel_dimensions = channel_dimensions
-    shape=(self.env.unwrapped._config.number_of_players_agent_controls(),
-           channel_dimensions[1], channel_dimensions[0],
-           len(observation_preprocessing.get_smm_layers(
-               self.env.unwrapped._config)))
+    shape = (self.env.unwrapped._config.number_of_players_agent_controls(),
+             channel_dimensions[1], channel_dimensions[0],
+             len(observation_preprocessing.get_smm_layers(
+                 self.env.unwrapped._config)))
     self.observation_space = gym.spaces.Box(
         low=0, high=255, shape=shape, dtype=np.uint8)
 
@@ -246,3 +246,30 @@ def reward(self, reward):
         reward[rew_index] += self._checkpoint_reward
         self._collected_checkpoints[is_left_to_right] += 1
     return reward
+
+
+class FrameStack(gym.Wrapper):
+  """Stack k last observations."""
+
+  def __init__(self, env, k):
+    gym.Wrapper.__init__(self, env)
+    self.obs = collections.deque([], maxlen=k)
+    low = env.observation_space.low
+    high = env.observation_space.high
+    low = np.concatenate([low] * k, axis=-1)
+    high = np.concatenate([high] * k, axis=-1)
+    self.observation_space = gym.spaces.Box(
+        low=low, high=high, dtype=env.observation_space.dtype)
+
+  def reset(self):
+    observation = self.env.reset()
+    self.obs.extend([observation] * self.obs.maxlen)
+    return self._get_observation()
+
+  def step(self, action):
+    observation, reward, done, info = self.env.step(action)
+    self.obs.append(observation)
+    return self._get_observation(), reward, done, info
+
+  def _get_observation(self):
+    return np.concatenate(list(self.obs), axis=-1)