Training Policy Gradient on the CartPoleV1 environment.
Training Policy Gradient on the LunarLander-v2 environment.
Actor Critic plots for the LunarLander-v2 environment.
Architectures (Click to expand)
- ReinforceAgent (without baseline)
- Sutton, R. S., Barto, A. G. (2018). Reinforcement Learning: An Introduction. The MIT Press.
- Graesser, L., Keng, W. L. (2019). Foundations of Deep Reinforcement Learning: Theory and Practice in Python. Addison-Wesley Professional.
- Chris Yoon, Dec 30, 2018, Deriving Policy Gradients and Implementing REINFORCE
- Silver, D. (2015, December 21). RL Course by David Silver - Lecture 7: Policy Gradient Methods.