This repo implements the Plan-to-explore algorithm from Planning to Explore via Self-Supervised World Models based on the PlaNet-Pytorch. It has been confirmed working on the DeepMind Control Suite/MuJoCo environment. Hyperparameters have been taken from the paper.
To install all dependencies with Anaconda run using the following commands.
conda env create -f conda_env.yml
source activate p2e
Zero-shot
python main.py --algo p2e --env walker-walk --action-repeat 2 --id name-of-experiement --zero-shot
Few-shot
python main.py --algo p2e --env walker-walk --action-repeat 2 --id name-of-experiement
For best performance with DeepMind Control Suite, try setting environment variable MUJOCO_GL=egl
(see instructions and details here).
Use Tensorboard to monitor the training.
You can see the performance from the zero-shot/few-shot trained policy on the test/episode_reward
.
tensorboard --logdir results