Training agents in OpenAI-Gym
with Policy-Gradient methods

A trained agent balancing an inverted pole on a moving cart.

A trained agent controlling boosters to land a spaceship.

Training plots

Training Policy Gradient on the CartPoleV1 environment.

Training Policy Gradient on the LunarLander-v2 environment.

Actor Critic plots for the LunarLander-v2 environment.

Architectures (Click to expand)

ReinforceAgent (without baseline)

Acknowledgements

Sutton, R. S., Barto, A. G. (2018). Reinforcement Learning: An Introduction. The MIT Press.
Graesser, L., Keng, W. L. (2019). Foundations of Deep Reinforcement Learning: Theory and Practice in Python. Addison-Wesley Professional.
Chris Yoon, Dec 30, 2018, Deriving Policy Gradients and Implementing REINFORCE
Silver, D. (2015, December 21). RL Course by David Silver - Lecture 7: Policy Gradient Methods.