Constrained Policy Optimization (CPO) is an algorithm for learning policies that should satisfy behavioral constraints throughout training. [1]
This module was designed for rllab [2], and includes the implementations of
described in our paper [1].
To configure, run the following command in the root folder of rllab
:
git submodule add -f https://github.com/jachiam/cpo sandbox/cpo
Run CPO in the Point-Gather environment with
python sandbox/cpo/experiments/CPO_point_gather.py
- Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel. "Constrained Policy Optimization". Proceedings of the 34th International Conference on Machine Learning (ICML), 2017.
- Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel. "Benchmarking Deep Reinforcement Learning for Continuous Control". Proceedings of the 33rd International Conference on Machine Learning (ICML), 2016.