Name		Name	Last commit message	Last commit date
parent directory ..
a2c_baselines		a2c_baselines
adaptive		adaptive
epopt		epopt
epopt_lstm		epopt_lstm
ppo2_baselines		ppo2_baselines
random		random
README.md		README.md
__init__.py		__init__.py
experiments.yml		experiments.yml
list_environments.py		list_environments.py
manual.py		manual.py
plot_heatmaps.py		plot_heatmaps.py
run_experiments.py		run_experiments.py
util.py		util.py

README.md

Examples

Please refer to the README in the sunblaze_envs folder for descriptions of the environments.

Policy and value function architectures

We consider two architectures for the policy and value function:

mlp: Policy and value function are MLPs with two hidden layers and no parameter sharing.
lstm: Policy and value function are separate fully connected layers on top of a LSTM whose inputs are learned features computed using a MLP.

Please refer to the paper for details.

PPO

To train with OpenAI Baselines PPO2:

python3 -m examples.ppo2_baselines.train \
--env SunblazeCartPole-v0 \
--output ppo2_cartpole \
--policy mlp \
--total-episodes 10000

A2C

To train with OpenAI Baselines A2C:

python3 -m examples.a2c_baselines.train \
--env SunblazeCartPole-v0 \
--output a2c_cartpole \
--policy lstm \
--total-episodes 10000

EPOpt

To train with EPOpt (based on the OpenAI Baselines PPO/A2C code):

Under the mlp policy:

python3 -m examples.epopt.train \
--env SunblazeCartPole-v0 \
--output epopt_cartpole \
--algorithm ppo2 \
--total-episodes 10000

Under the lstm policy:

python3 -m examples.epopt_lstm.train \
--env SunblazeCartPole-v0 \
--output epopt_lstm_cartpole \
--algorithm ppo2 \
--total-episodes 10000

RL²

To train with RL² (also based on the OpenAI Baselines PPO/A2C code):

python3 -m examples.adaptive.train \
--env SunblazeCartPole-v0 \
--output rl2_cartpole \
--algorithm ppo2 \
--trials 5000 \
--episodes-per-trial 2

Experiment Runner

To run one set of the experiments in the paper, using experiments.yml:

python3 -m examples.run_experiments examples/experiments.yml /tmp/experiments-output

Running in a headless environment

In order to run these environments in a headless environment (e.g., without an X server running), use xvfb:

xvfb-run -a -s "-screen 0 1400x900x24 +extension RANDR" -- python3 -m examples.ppo2_baselines.train ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

README.md

Examples

Policy and value function architectures

PPO

A2C

EPOpt

RL²

Experiment Runner

Running in a headless environment

Files

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

Examples

Policy and value function architectures

PPO

A2C

EPOpt

RL2

Experiment Runner

Running in a headless environment

RL²