This is the reference implementation for the following paper:
Efficient Exploration via State Marginal Matching.
Lisa Lee*, Benjamin Eysenbach*, Emilio Parisotto*, Eric Xing, Ruslan Salakhutdinov, Sergey Levine. arXiv preprint, 2019.
This repository is based on rlkit.
- You can clone this repository by running:
git clone https://github.com/RLAgent/state-marginal-matching.git
cd state-marginal-matching
All subsequent commands in this README should be run from the top-level directory of this repository (i.e., /path/to/state-marginal-matching/
).
-
Install Mujoco 1.5 and mujoco-py. Note that it requires a Mujoco license.
-
Create and activate conda enviroment:
conda env create -f conda_env.yml
source activate smm_env
Note: If running on Mac OS X, comment out patchelf
, box2d
, and box2d-kengz
in conda_env.yml
.
To deactivate the conda environment, run conda deactivate
. To remove it, run conda env remove -n smm_env
.
python -m train configs/smm_manipulation.json # State Marginal Matching (SMM) with 4 latent skills
python -m train configs/sac_manipulation.json # Soft Actor-Critic (SAC)
python -m train configs/icm_manipulation.json # Intrinsic Curiosity Module (ICM)
python -m train configs/count_manipulation.json # Count-based Exploration
python -m train configs/pseudocount_manipulation.json # Pseudocount
The log directory can be set with --log-dir /path/to/log/dir
. By default, the log directory is set to out/
.
python -m visualize /path/to/log/dir # Without historical averaging
python -m visualize /path/to/log/dir --num-historical-policies 10 # With historical averaging
python -m test /path/to/log/dir # Without historical averaging
python -m test /path/to/log/dir --config configs/test_ha.json # With historical averaging
To view more flag options, run the scripts with the --help
flag. For example:
$ python -m train --help
Usage: train.py [OPTIONS] CONFIG
Options:
--cpu
--log-dir TEXT
--snapshot-gap INTEGER How often to save model checkpoints (by # epochs).
--help Show this message and exit.
The algorithms are based on the following papers:
Efficient Exploration via State Marginal Matching.
Lisa Lee*, Benjamin Eysenbach*, Emilio Parisotto*, Eric Xing, Ruslan Salakhutdinov, Sergey Levine. arXiv preprint, 2019.
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine. ICML 2018.
Curiosity-driven Exploration by Self-supervised Prediction.
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell. ICML 2017.
Unifying Count-Based Exploration and Intrinsic Motivation.
Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos. NIPS 2016.
@article{smm2019,
title={Efficient Exploration via State Marginal Matching},
author={Lisa Lee and Benjamin Eysenbach and Emilio Parisotto and Eric Xing and Sergey Levine and Ruslan Salakhutdinov},
year={2019}
}