Skip to content

Repository for Iterated Relearning: The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning

License

Notifications You must be signed in to change notification settings

maximilianigl/rl-iter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This is the codebase for our paper "The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning " by M.Igl, G. Farquhar, J. Luketina, W. Boehmer and S. Whiteson.

It also includes an implementation of IBAC-SNI on ProcGen.

It comprises several sub-folders:

  1. gym-minigrid contains the grid-world environment (for the Multiroom experiments) and is adapted from https://github.com/maximecb/gym-minigrid This environment is used together with torch_rl
  2. torch_rl contains the agent to run on the gym-minigrid environment and is adapted from https://github.com/lcswillems/rl-starter-files
  3. multiroom_exps contains the training code for the Multiroom experiments.
  4. train-procgen contains the code for the results on the ProcGen domain. It is code adapted from https://github.com/openai/train-procgen
  5. cifar contains the code for the supervised experiments. It is code adapted from https://github.com/kuangliu/pytorch-cifar

Plotting is explain at the very end.

Preparation

All experiments can be run in the accompanying docker container. To build it, call

./build.sh

in the root folder.

Then, an interactive docker session can be started with

./runi.sh <GPU-ID> <Containername>
./runi.sh 0 iter

where <GPU-ID> is the GPU you want to use and <Containername> can be anything or left empty.

Supervised Learning

After starting interactive session (./runi.sh in root folder), move to cifar folder:

cd cifar

Figure 2

Annealing the fraction of correct datapoints from 0 to 1

Run the baseline:

python main.py -p

Run with non-stationarity:

python main.py -p with annealing.every_n_epochs=1 annealing.type=<type>

where <type> can either be size (=Dataset size), random (=Noisy labels) or consistent (=Wrong labels).

Figure 3 left

Results for self-distillation

For the baseline (i.e. no non-stationarity)

python main.py -p with epochs=2500 annealing.every_n_epochs=1 self_distillation=1500 

And for non-stationarities:

python main.py -p with epochs=2500 annealing.every_n_epochs=1 self_distillation=1500 annealing.type=<type>

where again, please fill in type <type> as desired.

Figure 3 middle

Two phase training

python main.py -p with epochs=1500 annealing.duration=700 frozen_test_epochs=800 annealing.type=<type> annealing.start_fraction=<fraction>

where <type> and <fraction> should be filled out as desired. In the experiments, we used the following values for <fraction>.

For Wrong labels and Noisy Labels: 0.05, 0.1 0.2, 0.3, 0.4, 0.5, 0.75, 1.0
Additionally For Dataset Size: 0.005, 0.01, 0.02 + same as for the others

Multiroom

Preparation

  • Start the interactive docker session: runi.sh in root folder
  • Install gym-minigrid: pip install -e gym-minigrid
  • Install torch_rl: pip install -e torch_rl
  • Move to multiroom_exps: cd multiroom_exps

Running commands

python train.py -p with iter_type=<type>

where type can be either none (small caps!) or distill

ProcGen

For ProcGen we need 4 GPUs at once, so we need to start the interactive docker container as

./runi.sh 0,1,2,3 procgen

for GPUs 0,1,2,3. Then go to subfolder cd train-procgen/train_procgen/

Running the experiments

Baseline PPO:

mpiexec -np 4 python train.py -p with env_name=<env_name>

where <env_name> can be any of the ProcGen environments. The ones used in this paper were starpilot, dodgeball, climber, ninja and bigfish.

PPO+ITER:

mpiexec -np 4 python train.py -p with env_name=<env_name> iter_loss.use=True

Baseline IBAC:

mpiexec -np 4 python train.py -p with env_name=<env_name> arch.reg=ibac

where we use selective noise injection as well.

IBAC+ITER:

mpiexec -np 4 python train.py -p with env_name=<env_name> arch.reg=ibac iter_loss.use=True

Ablation studies

Sequential ITER

mpiexec -np 4 python train.py -p with env_name=<env_name> iter_loss.use=True \
iter_loss.v2=True \
iter_loss.timesteps_initial=71_000_000 \
iter_loss.timesteps_anneal=9_000_000 \
iter_loss.timesteps_free=71_000_000

Careful: This will generate about 500GB of data!

ITER without RL terms in distillation

mpiexec -np 4 python train.py -p with env_name=<env_name> iter_loss.use=True \
iter_loss.alpha_reg.schedule=const \
iter_loss.use_burnin_rl_loss=False 

Plotting

The experiments are using sacred for configuration and logging. For more thorough use of this codebase, I'd recommend setting up a MongoDB to store the results. At the moment, results are logged using the FileStorageObserver into a db folder in the root directory.

There is a very simple plotting script included: plot.py:

python plot.py --id <id> --metric <metric>

where <id> is the unique experiment id assigned to each run by sacred. It is printed in stdout somewhere at the beginning when starting a new run.

<metric> is the name of what you want to plot. This is train_acc and test_acc for the supervised experiments, rreturn_mean for Multiroom and eprewmean for ProcGen. Many more things are also being logged, either check the code or metrics.json file to see what.

Special for Procgen: The ProcGen experiments run 4 threads, three for training, one for testing. Each of those 4 threads gets one unique id, but only two of those threads are acutally logging something, one the training, one the test performance. Just try plotting for each of those 4 ids, it will either crash (if that id wasn't logging) or the plotting script will actually print out whether that's the train or test performance.

About

Repository for Iterated Relearning: The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages