Implementation of Deep Q Learning (DQN) algorithm with Tensorflow 2. Environments of OpenAI Gym are used for testing.
Install dependencies. You may want to use a Python virtual env:
pip3 install -r requirements.txt
Start training:
python3 train.py
Learns action values using bootstrapping. The following loss function is minimized:
To make the algorithm more robust, the following tricks are used:
A copy (target network) of the current model (Q network) is used for predicting the best action value of the next state. The weights of the target network are not touched during optimization. Every few train steps, the learned weights of the Q network are copied to the target network. This reduces training instability due to feedback effects when changing the weights of the Q network.
A replay memory is used to remember explored transitions which are then randomly sampled in a mini batch for training. This reduces correlations between transition samples and thereby improves stability.
Huber loss is used instad of MSE. The error grows quadratically for small values but for values over a given threshold the function is linear. This reduces the impact of outliers on the optimization.