Skip to content

Code for paper "Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning" by David Janz*, Jiri Hron*, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek. NeurIPS 2019. *Equal contribution

Notifications You must be signed in to change notification settings

DavidJanz/successor_uncertainties_tabular

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Successor Uncertainties, Tabular Experiments

This code allows for reproduction of the tabular experiments in https://arxiv.org/abs/1810.06530. Click here for code to reproduce the Atari 2600 experiments.

Clone && pip install the requirements.

To reproduce results for the Tree MDP, run

python3 run 5 specs/name_of_spec.json

where name_of_spec is one of tree250_su, tree250_boot1x or tree250_boot25x. The json files contain the settings for each run configuration. Successor Uncertainties and Bootstrap with 1x computation should finish in minutes. Bootstrap with 25x compute may take considerable time.

Output will be saved to the data_out folder. To plot results (as in figure 2 in the paper), run

python3 plotting/plot_scaling.py --file data_out/name_of_file.pkl --show

Figure with results should display to screen, but will also be saved to figs/scaling.pdf.

To reproduce Chain/Deep Sea experiments (figure 3 in paper) run

python3 run 5 specs/deepsea_su.json

Note, however, this might take a while and requires a lot of RAM. Edit deepsea_su.json and change env_size values to test on smaller versions of the MDP.

Then, to plot the results, run

python3 plotting/plot_scaling.py --file data_out/su_deepsea_out.pkl --loglog --show

Plot will be saved to figs/scaling_loglog.pdf. Orange line in the resulting plot is taken directly from Osband et al. 2018.

About

Code for paper "Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning" by David Janz*, Jiri Hron*, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek. NeurIPS 2019. *Equal contribution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages