Skip to content

Releases: llucid-97/FastDeepQLearning

Added TQC-SAC as new baseline

06 Aug 08:53
Compare
Choose a tag to compare

Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

Paper: arXiv:2005.04269
The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional representation and truncation allow for arbitrary granular overestimation control, while ensembling provides additional score improvements. TQC outperforms the current state of the art on all environments from the continuous control benchmark suite, demonstrating 25% improvement on the most challenging Humanoid environment.


Note: This method relies on using an ensemble of critics which slows down the wall-clock time of experiments.
It's not nearly as bad as other distributional methods like IQN, and it can easily be tuned to almost match the performance of double q learning (by using fewer networks, but more predictions per network).
Regardless, the stability improvements it brings are worth the speed hit.

Combining Hindsight Experience Replay with Multistep Returns while retaining off-policy characteristics

06 Aug 08:57
Compare
Choose a tag to compare

I discuss the design principles and benchmarks behind this in this article. No Point repeating them here.