Skip to content

This repository displays the use of Reinforcement Learning, specifically QLearning, REINFORCE, and Actor Critic (A2C) methods to play CartPole-v0 of OpenAI Gym.

Notifications You must be signed in to change notification settings

vikrams169/Using-RL-in-CartPole-v0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Using-RL-in-CartPole-v1

This repository displays the use of Reinforcement Learning, specifically QLearning, REINFORCE, and Advantage Actor Critic (A2C) methods to play CartPole-v1 of OpenAI Gym.

The Cart Lake environment can be better explained or reviwed by going to the souce code here.
In this environment, there exists a pole on a frictionless wire/line, and the goal is to keep it moving without collapsing for as long as possible. The reward for standing each timestep is +1, and if the pole moves more than 15 degrees from the vertical, the episode ends (so basically no negative rewards). There are only two possible actions that are moving the point on the pole on the wire/line right or left, every timestep.
This environment has been solved with the objective of reaching maximum reward (thus reaching the final goal), and has been done so, by using three deep reinforcement learning techniques (all use a neural network function approximator having same architecture, mapping form state to action/policy), each trained on 5,000 episodes.
To better play this environment, there are three deep reinforcement learning techniques used, and compared:

1. Deep QLearning Method


Using experience replay with bootstrapping every timestep.
The average total rewards and episode lengths look like:

2. REINFORCE Method


Uses a policy gradient technique with every visit monte carlo sampling at the end of each episode.
The average rewards and episode lengths look like:

3. Advantage Actor-Critic (A2C) Method


A single network architecture mapping to both value and policy, to obtain advantages to use instead of returns in a policy gradient and Qlearning update.
The average rewards and episode lengths look like:
As it can be seen, though Deep QLearning and REINFORCE methods give similar results (not always but true in this case), actor critic methods can do much better, in this case, almost twice (as written in the paper on A3C by Google DeepMind)!

About

This repository displays the use of Reinforcement Learning, specifically QLearning, REINFORCE, and Actor Critic (A2C) methods to play CartPole-v0 of OpenAI Gym.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages