Skip to content

Reinforcement Learning implementation with Tensorflow 2.

Notifications You must be signed in to change notification settings

oberger4711/rl-tf2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning Implementation with Tensorflow 2

Implementation of Deep Q Learning (DQN) algorithm with Tensorflow 2. Environments of OpenAI Gym are used for testing.

Instructions

Install dependencies. You may want to use a Python virtual env:

pip3 install -r requirements.txt

Start training:

python3 train.py

DQN: Off-policy TD Learning

Learns action values using bootstrapping. The following loss function is minimized:

equation

To make the algorithm more robust, the following tricks are used:

A copy (target network) of the current model (Q network) is used for predicting the best action value of the next state. The weights of the target network are not touched during optimization. Every few train steps, the learned weights of the Q network are copied to the target network. This reduces training instability due to feedback effects when changing the weights of the Q network.

A replay memory is used to remember explored transitions which are then randomly sampled in a mini batch for training. This reduces correlations between transition samples and thereby improves stability.

Huber loss is used instad of MSE. The error grows quadratically for small values but for values over a given threshold the function is linear. This reduces the impact of outliers on the optimization.

About

Reinforcement Learning implementation with Tensorflow 2.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages