Reinforcement Learning Implementation with Tensorflow 2

Implementation of Deep Q Learning (DQN) algorithm with Tensorflow 2. Environments of OpenAI Gym are used for testing.

Instructions

Install dependencies. You may want to use a Python virtual env:

pip3 install -r requirements.txt

Start training:

python3 train.py

DQN: Off-policy TD Learning

Learns action values using bootstrapping. The following loss function is minimized:

To make the algorithm more robust, the following tricks are used:

A copy (target network) of the current model (Q network) is used for predicting the best action value of the next state. The weights of the target network are not touched during optimization. Every few train steps, the learned weights of the Q network are copied to the target network. This reduces training instability due to feedback effects when changing the weights of the Q network.

A replay memory is used to remember explored transitions which are then randomly sampled in a mini batch for training. This reduces correlations between transition samples and thereby improves stability.

Huber loss is used instad of MSE. The error grows quadratically for small values but for values over a given threshold the function is linear. This reduces the impact of outliers on the optimization.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
doc		doc
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning Implementation with Tensorflow 2

Instructions

DQN: Off-policy TD Learning

About

Releases

Packages

Languages

oberger4711/rl-tf2

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Implementation with Tensorflow 2

Instructions

DQN: Off-policy TD Learning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages