Skip to content

Commit

Permalink
Project import generated by Copybara.
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 267556712
  • Loading branch information
gkkurach authored and qstanczyk committed Sep 6, 2019
1 parent 70638d8 commit 625c2b3
Show file tree
Hide file tree
Showing 12 changed files with 275 additions and 38 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@ should not change, as modifications made to the environment are either
new features or backward compatible bug fixes. We will maintain vX branches
pointing at the most recent vX.Y.

v1.4
- Added implementation of architecture 'gfootball_impala_cnn' used in the paper.
- Added possibility of loading PPO checkpoints as players, added example checkpoints (for levels 11_vs_11_easy_stochastic and academy_run_to_score_with_keeper).
- Removed TensorFlow dependency when running only the environment (but for training OpenAI baselines it's still needed)

v1.3
- Fix to pixel representation (https://github.com/google-research/football/issues/54,56,57).

Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -yq git cmake build-essential \
libgl1-mesa-dev libsdl2-dev \
libsdl2-image-dev libsdl2-ttf-dev libsdl2-gfx-dev libboost-all-dev \
libdirectfb-dev libst-dev mesa-utils xvfb x11vnc libsqlite3-dev \
libdirectfb-dev libst-dev mesa-utils xvfb x11vnc \
glee-dev libsdl-sge-dev python3-pip

COPY . /gfootball
Expand Down
41 changes: 30 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,32 +18,37 @@ You can either install the code from github (newest version) or from pypi (stabl
1. Install required apt packages with
`sudo apt-get install git cmake build-essential libgl1-mesa-dev libsdl2-dev
libsdl2-image-dev libsdl2-ttf-dev libsdl2-gfx-dev libboost-all-dev
libdirectfb-dev libst-dev mesa-utils xvfb x11vnc libsqlite3-dev
glee-dev libsdl-sge-dev python3-pip`
libdirectfb-dev libst-dev mesa-utils xvfb x11vnc glee-dev libsdl-sge-dev
python3-pip`

1. Install gfootball python package from pypi:

- Use `pip3 install gfootball[tf_cpu]` if you want to use CPU version of TensorFlow.
- Use `pip3 install gfootball[tf_gpu]` if you want to use GPU version of TensorFlow.
- Use `pip3 install gfootball`
- This command can run for couple of minutes, as it compiles the C++ environment in the background.

OR install gfootball python package (run the commands from the main project directory):

- `git clone https://github.com/google-research/football.git`
- `cd football`
- Use `pip3 install .[tf_cpu]` if you want to use the CPU version of TensorFlow.
- Use `pip3 install .[tf_gpu]` if you want to use GPU version of TensorFlow.
- `pip3 install .`
- This command can run for a couple of minutes, as it compiles the C++ environment in the background.

## Running experiments
First, install newest OpenAI Baselines:
`pip3 install git+https://github.com/openai/baselines.git@master`.
Install additional dependencies:

- TensorFlow: `pip3 install "tensorflow<2.0"` or
`pip3 install "tensorflow-gpu<2.0"`, depending on whether you want CPU or
GPU version;
- Sonnet: `pip3 install dm-sonnet`;
- OpenAI Baselines:
`pip3 install git+https://github.com/openai/baselines.git@master`.

Then:

- To run example PPO experiment on `academy_empty_goal` scenario, run
`python3 -m gfootball.examples.run_ppo2 --level=academy_empty_goal_close`
`python3 -m gfootball.examples.run_ppo2 --level=academy_empty_goal_close`
- To run on `academy_pass_and_shoot_with_keeper` scenario, run
`python3 -m gfootball.examples.run_ppo2 --level=academy_pass_and_shoot_with_keeper`
`python3 -m gfootball.examples.run_ppo2 --level=academy_pass_and_shoot_with_keeper`

In order to train with nice replays being saved, run
`python3 -m gfootball.examples.run_ppo2 --dump_full_episodes=True --render=True`
Expand All @@ -54,12 +59,26 @@ base scenario and the left player is controlled by the keyboard. Different types
of players are supported (gamepad, external bots, agents...). For possible
options run `python3 -m gfootball.play_game -helpfull`.

In particular, one can play against agent trained with `run_ppo2` script with
the following command:
`python3 -m gfootball.play_game --players "keyboard:left_players=1;ppo2_cnn:right_players=1,checkpoint=$YOUR_PATH"`

Please note that playing
the game is implemented through an environment, so human-controlled players use
the same interface as the agents. One important fact is that there is a single
action per 100 ms reported to the environment, which might cause a lag effect
when playing.

## Trained checkpoints
We provide trained PPO checkpoints for the following scenarios:

- [11_vs_11_easy_stochastic](https://storage.googleapis.com/gfootball-public-bucket/trained_model_11_vs_11_easy_stochastic),
- [academy_run_to_score_with_keeper](https://storage.googleapis.com/gfootball-public-bucket/trained_model_academy_run_to_score_with_keeper).

In order to see the checkpoints playing, run
`python3 -m gfootball.play_game --players "ppo2_cnn:left_players=1,policy=gfootball_impala_cnn,checkpoint=$CHECKPOINT" --level=$LEVEL`,
where `$CHECKPOINT` is the path to downloaded checkpoint.

### Keyboard mapping
The game defines following keyboard mapping (for the `keyboard` player type):

Expand Down Expand Up @@ -238,7 +257,7 @@ A simple example of training multi-agent can be found in examples/run_multiagent

### GPU version
1. Build with `docker build --build-arg DOCKER_BASE=tensorflow/tensorflow:1.12.0-gpu-py3 --build-arg DEVICE=gpu . -t gfootball`
1. Enter the image with `nvidia-docker run -it gfootball bash`
1. Enter the image with `nvidia-docker run -it gfootball bash` or `docker run --gpus all -it gfootball bash` for docker 19.03 or later.

After entering the image, you can run sample training with `python3 -m gfootball.examples.run_ppo2`.
Unfortunately, rendering is not supported inside the docker.
Expand Down
4 changes: 1 addition & 3 deletions gfootball/env/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,8 +150,6 @@ def create_environment(env_name='',
env = wrappers.SingleAgentObservationWrapper(env)
env = wrappers.SingleAgentRewardWrapper(env)
if stacked:
# Import FrameStack here to avoid unconditional dependence on baselines.
from baselines.common.atari_wrappers import FrameStack
env = FrameStack(env, 4)
env = wrappers.FrameStack(env, 4)

return env
7 changes: 5 additions & 2 deletions gfootball/env/football_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,8 @@ def _convert_observations(self, original, player,

def _get_actions(self):
obs = self._env.observation()
actions = []
left_actions = []
right_actions = []
left_player_position = 0
right_player_position = 0
for player in self._players:
Expand All @@ -146,7 +147,9 @@ def _get_actions(self):
assert len(adopted_obs) == len(
a), 'Player returned {} actions instead of {}.'.format(
len(a), len(adopted_obs))
actions.extend(a)
left_actions.extend(a[:player.num_controlled_left_players()])
right_actions.extend(a[player.num_controlled_left_players():])
actions = left_actions + right_actions
return actions

def step(self, action):
Expand Down
18 changes: 12 additions & 6 deletions gfootball/env/observation_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import datetime
import logging
import os
import shutil
import tempfile
import timeit
import traceback
Expand All @@ -34,10 +35,11 @@
from six.moves import range
from six.moves import zip
import six.moves.cPickle
import tensorflow as tf

REMOVED_FRAME = 'removed'

WRITE_FILES = True

try:
import cv2
except ImportError:
Expand Down Expand Up @@ -208,7 +210,8 @@ def write_dump(name, trace, skip_visuals=False, config={}):
os.close(fd)
try:
# For some reason sometimes the file is missing, so the code fails.
tf.io.gfile.copy(temp_path, name + '.avi', overwrite=True)
if WRITE_FILES:
shutil.copy2(temp_path, name + '.avi')
os.remove(temp_path)
except:
logging.info(traceback.format_exc())
Expand All @@ -219,8 +222,9 @@ def write_dump(name, trace, skip_visuals=False, config={}):
temp_frames.append(o._trace['observation']['frame'])
o._trace['observation']['frame'] = REMOVED_FRAME
to_pickle.append(o._trace)
with tf.io.gfile.GFile(name + '.dump', 'wb') as f:
six.moves.cPickle.dump(to_pickle, f)
if WRITE_FILES:
with open(name + '.dump', 'wb') as f:
six.moves.cPickle.dump(to_pickle, f)
for o in trace:
if 'frame' in o._trace['observation']:
o._trace['observation']['frame'] = temp_frames.pop(0)
Expand Down Expand Up @@ -356,7 +360,9 @@ def write_dump(self, name):
config._last_dump = timeit.default_timer()
if self._dump_directory is None:
self._dump_directory = self._config['tracesdir']
tf.io.gfile.makedirs(self._dump_directory)
if WRITE_FILES:
if not os.path.exists(self._dump_directory):
os.makedirs(self._dump_directory)
config._file_name = '{2}/{0}_{1}'.format(
name,
datetime.datetime.now().strftime('%Y%m%d-%H%M%S%f'),
Expand All @@ -380,4 +386,4 @@ def process_pending_dumps(self, finish):
assert not config._file_name
if config._result.ready() or finish:
config._result.get()
config._result = None
config._result = None
114 changes: 114 additions & 0 deletions gfootball/env/players/ppo2_cnn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# coding=utf-8
# Copyright 2019 Google LLC
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

r"""Player from PPO2 cnn checkpoint.
Example usage with play_game script:
python3 -m gfootball.play_game \
--players "ppo2_cnn:left_players=1,checkpoint=$YOUR_PATH,policy=$POLICY"
$POLICY should be one of: cnn, impala_cnn, gfootball_impala_cnn.
"""

from baselines.common.policies import build_policy
from gfootball.env import football_action_set
from gfootball.env import observation_preprocessing
from gfootball.env import player_base
from gfootball.examples import models
import gym
import joblib
import numpy as np
import tensorflow as tf


class Player(player_base.PlayerBase):
"""An agent loaded from PPO2 cnn model checkpoint."""

def __init__(self, player_config, env_config):
player_base.PlayerBase.__init__(self, player_config)

self._action_set = 'default'
self._sess = tf.Session()
self._player_prefix = 'player_{}'.format(player_config['index'])
stacking = 4 if player_config.get('stacked', True) else 1
policy = player_config.get('policy', 'cnn')
self._stacker = ObservationStacker(stacking)
with tf.variable_scope(self._player_prefix):
with tf.variable_scope('ppo2_model'):
policy_fn = build_policy(DummyEnv(self._action_set, stacking), policy)
self._policy = policy_fn(nbatch=1, sess=self._sess)
_load_variables(player_config['checkpoint'], self._sess,
prefix=self._player_prefix + '/')

def __del__(self):
self._sess.close()

def take_action(self, observation):
assert len(observation) == 1, 'Multiple players control is not supported'

observation = observation_preprocessing.generate_smm(observation)
observation = self._stacker.get(observation)
action = self._policy.step(observation)[0][0]
actions = [football_action_set.action_set_dict[self._action_set][action]]
return actions

def reset(self):
self._stacker.reset()


def _load_variables(load_path, sess, prefix='', remove_prefix=True):
"""Loads variables from checkpoint of policy trained by baselines."""

# Forked from address below since we needed loading from different var names:
# https://github.com/openai/baselines/blob/master/baselines/common/tf_util.py
variables = [v for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
if v.name.startswith(prefix)]

loaded_params = joblib.load(load_path)
restores = []
for v in variables:
v_name = v.name[len(prefix):] if remove_prefix else v.name
restores.append(v.assign(loaded_params[v_name]))

sess.run(restores)


class ObservationStacker(object):
"""Utility class that produces stacked observations."""

def __init__(self, stacking):
self._stacking = stacking
self._data = []

def get(self, observation):
if self._data:
self._data.append(observation)
self._data = self._data[-self._stacking:]
else:
self._data = [observation] * self._stacking
return np.concatenate(self._data, axis=-1)

def reset(self):
self._data = []


class DummyEnv(object):
# We need env object to pass to build_policy, however real environment
# is not there yet.

def __init__(self, action_set, stacking):
self.action_space = gym.spaces.Discrete(
len(football_action_set.action_set_dict[action_set]))
self.observation_space = gym.spaces.Box(
0, 255, shape=[72, 96, 4 * stacking], dtype=np.uint8)
39 changes: 33 additions & 6 deletions gfootball/env/wrappers.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@
from __future__ import division
from __future__ import print_function

import collections
import cv2
from gfootball.env import observation_preprocessing
import gfootball_engine as libgame
import gym
import numpy as np
import cv2


class PeriodicDumpWriter(gym.Wrapper):
Expand Down Expand Up @@ -153,10 +153,10 @@ def __init__(self, env,
observation_preprocessing.SMM_HEIGHT)):
gym.ObservationWrapper.__init__(self, env)
self._channel_dimensions = channel_dimensions
shape=(self.env.unwrapped._config.number_of_players_agent_controls(),
channel_dimensions[1], channel_dimensions[0],
len(observation_preprocessing.get_smm_layers(
self.env.unwrapped._config)))
shape = (self.env.unwrapped._config.number_of_players_agent_controls(),
channel_dimensions[1], channel_dimensions[0],
len(observation_preprocessing.get_smm_layers(
self.env.unwrapped._config)))
self.observation_space = gym.spaces.Box(
low=0, high=255, shape=shape, dtype=np.uint8)

Expand Down Expand Up @@ -246,3 +246,30 @@ def reward(self, reward):
reward[rew_index] += self._checkpoint_reward
self._collected_checkpoints[is_left_to_right] += 1
return reward


class FrameStack(gym.Wrapper):
"""Stack k last observations."""

def __init__(self, env, k):
gym.Wrapper.__init__(self, env)
self.obs = collections.deque([], maxlen=k)
low = env.observation_space.low
high = env.observation_space.high
low = np.concatenate([low] * k, axis=-1)
high = np.concatenate([high] * k, axis=-1)
self.observation_space = gym.spaces.Box(
low=low, high=high, dtype=env.observation_space.dtype)

def reset(self):
observation = self.env.reset()
self.obs.extend([observation] * self.obs.maxlen)
return self._get_observation()

def step(self, action):
observation, reward, done, info = self.env.step(action)
self.obs.append(observation)
return self._get_observation(), reward, done, info

def _get_observation(self):
return np.concatenate(list(self.obs), axis=-1)
Loading

0 comments on commit 625c2b3

Please sign in to comment.