This is the codebase for our paper "The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning " by M.Igl, G. Farquhar, J. Luketina, W. Boehmer and S. Whiteson.
It also includes an implementation of IBAC-SNI on ProcGen.
It comprises several sub-folders:
gym-minigrid
contains the grid-world environment (for the Multiroom experiments) and is adapted from https://github.com/maximecb/gym-minigrid This environment is used together withtorch_rl
torch_rl
contains the agent to run on thegym-minigrid
environment and is adapted from https://github.com/lcswillems/rl-starter-filesmultiroom_exps
contains the training code for the Multiroom experiments.train-procgen
contains the code for the results on the ProcGen domain. It is code adapted from https://github.com/openai/train-procgencifar
contains the code for the supervised experiments. It is code adapted from https://github.com/kuangliu/pytorch-cifar
Plotting is explain at the very end.
All experiments can be run in the accompanying docker container. To build it, call
./build.sh
in the root folder.
Then, an interactive docker session can be started with
./runi.sh <GPU-ID> <Containername>
./runi.sh 0 iter
where <GPU-ID>
is the GPU you want to use and <Containername>
can be anything or left empty.
After starting interactive session (./runi.sh
in root folder), move to cifar
folder:
cd cifar
Annealing the fraction of correct datapoints from 0 to 1
Run the baseline:
python main.py -p
Run with non-stationarity:
python main.py -p with annealing.every_n_epochs=1 annealing.type=<type>
where <type>
can either be size
(=Dataset size), random
(=Noisy labels) or consistent
(=Wrong labels).
Results for self-distillation
For the baseline (i.e. no non-stationarity)
python main.py -p with epochs=2500 annealing.every_n_epochs=1 self_distillation=1500
And for non-stationarities:
python main.py -p with epochs=2500 annealing.every_n_epochs=1 self_distillation=1500 annealing.type=<type>
where again, please fill in type <type>
as desired.
Two phase training
python main.py -p with epochs=1500 annealing.duration=700 frozen_test_epochs=800 annealing.type=<type> annealing.start_fraction=<fraction>
where <type>
and <fraction>
should be filled out as desired.
In the experiments, we used the following values for <fraction>
.
For Wrong labels and Noisy Labels: 0.05, 0.1 0.2, 0.3, 0.4, 0.5, 0.75, 1.0
Additionally For Dataset Size: 0.005, 0.01, 0.02 + same as for the others
Preparation
- Start the interactive docker session:
runi.sh
in root folder - Install
gym-minigrid
:pip install -e gym-minigrid
- Install
torch_rl
:pip install -e torch_rl
- Move to
multiroom_exps
:cd multiroom_exps
Running commands
python train.py -p with iter_type=<type>
where type
can be either none
(small caps!) or distill
For ProcGen we need 4 GPUs at once, so we need to start the interactive docker container as
./runi.sh 0,1,2,3 procgen
for GPUs 0,1,2,3.
Then go to subfolder cd train-procgen/train_procgen/
Baseline PPO:
mpiexec -np 4 python train.py -p with env_name=<env_name>
where <env_name>
can be any of the ProcGen environments.
The ones used in this paper were starpilot, dodgeball, climber, ninja and bigfish.
PPO+ITER:
mpiexec -np 4 python train.py -p with env_name=<env_name> iter_loss.use=True
Baseline IBAC:
mpiexec -np 4 python train.py -p with env_name=<env_name> arch.reg=ibac
where we use selective noise injection as well.
IBAC+ITER:
mpiexec -np 4 python train.py -p with env_name=<env_name> arch.reg=ibac iter_loss.use=True
Sequential ITER
mpiexec -np 4 python train.py -p with env_name=<env_name> iter_loss.use=True \
iter_loss.v2=True \
iter_loss.timesteps_initial=71_000_000 \
iter_loss.timesteps_anneal=9_000_000 \
iter_loss.timesteps_free=71_000_000
Careful: This will generate about 500GB of data!
ITER without RL terms in distillation
mpiexec -np 4 python train.py -p with env_name=<env_name> iter_loss.use=True \
iter_loss.alpha_reg.schedule=const \
iter_loss.use_burnin_rl_loss=False
The experiments are using sacred for configuration and logging.
For more thorough use of this codebase, I'd recommend setting up a MongoDB to store the results.
At the moment, results are logged using the FileStorageObserver into a db
folder in the root directory.
There is a very simple plotting script included: plot.py
:
python plot.py --id <id> --metric <metric>
where <id>
is the unique experiment id assigned to each run by sacred.
It is printed in stdout somewhere at the beginning when starting a new run.
<metric>
is the name of what you want to plot.
This is train_acc
and test_acc
for the supervised experiments, rreturn_mean
for Multiroom and eprewmean
for ProcGen.
Many more things are also being logged, either check the code or metrics.json
file to see what.
Special for Procgen: The ProcGen experiments run 4 threads, three for training, one for testing. Each of those 4 threads gets one unique id, but only two of those threads are acutally logging something, one the training, one the test performance. Just try plotting for each of those 4 ids, it will either crash (if that id wasn't logging) or the plotting script will actually print out whether that's the train or test performance.