Implementation of the DeepQ algorithm with Double Q.
DeepQ is an progression on standard Q-learning.
With DeepQ, rather than storing Q-values in a table, they are aprroximated using neural networks. This allows for more accurate Q-value estimates as well as the ability to model continuous states.
DeepQ also includes the notion of experience replay, in which the agent stores the states, actions, and outcomes at every step in memory and then randomly samples from them during training.
Double-Q is further implemented in which the target, or expected future rewards, is modeled in a separate network having the weights intermittently copied over from the 'online' network making the predictions. This helps learning by providing a more stable target to pursue.
See the experiments folder for example implementations.
- Prioritized replay
- Dueling Q
- Soft updates
- More environments