Room world from Sutton et al (Between MDP and semi-MDP)
Paper: Between MDPs and semi-MDPs
Used the room environment to test out some ideas for hierarchical reinforcement learning and planning in HRL.
- Plain flat Q-learning
- Just one version. [code]
- Hierarchical Q-learning
- Basic version (s-MDP; two-layer hierarchy with predefined deterministic lower-level policy) [code]
- Intraoption-learning version (lower-level is trainable) [code]
- Planning Hierarchical Q-learning
- Basic version (same as Hierarchical Q-learning but with a 2-step plan output from the upper level; No replanning) [code]
- Version with replanning [code]