Implement Truncated Quantile Critics (TQC) #83

araffin · 2020-06-30T17:23:42Z

I'm normally against implementing very recent papers before they prove to be valuable but I would like to make an exception for that one, especially because of the good results. It was recently accepted at ICML 2020.

Paper: Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics
Code: https://github.com/bayesgroup/tqc_pytorch

Background

This paper build on SAC, TD3 and QR-DQN, making use of quantile regression to predict a distribution for the value function (instead of a mean value).
It truncates the quantiles predicted by different networks (a bit as it is done in TD3).
This is for continuous actions only.

Pros

I already implemented it in SB3 (https://github.com/DLR-RM/stable-baselines3/tree/feat/tqc), it was pretty straightforward as I'm using SAC code for the backbone (I did not remove the duplicated code yet) and the authors code for the loss. The difference between SAC and TQC is 30 lines (15 for the loss and 15 for the critic code).
And using SAC hyperparameters from the zoo, I could achieve very good results on Pybullet env and on BipedalWalkerHardcore (for this env it reaches maximal performance 10x faster than my previous experiments).
The good news is that SAC hyperparameters are transferable to this new algorithm.

The loss function can be re-used to implemented QR-DQN which is apparently a huge improvement over DQN (with minimal effort).
The author code is only both in Tensorflow and Pytorch and the results are really good.

Cons

it adds a bit of complexity / duplication but this can be mitigated if it derives from SAC class.

My question is: should we integrate it for v1.0 (#1) or should we wait?

@hill-a @Miffyli @AdamGleave @erniejunior

Miffyli · 2020-06-30T20:29:08Z

I'm normally against implementing very recent papers before they prove to be valuable but I would like to make an exception for that one, especially because of the good results.

I agree with this one, but the results both on paper and what you say sound promising. This sounds like a solid thing to add to contrib-package once we have that, but I wonder if it could be added to main repo for now (and then later moved to contrib), or should the contrib be created right away? It needs bit of thinking how to do it exactly, but there are other stuff coming up already that could go there (e.g. fancier returns).

Edit: Adding this straight to master is bit iffy, given that even the basic algorithms are still under (slow) review ^^'

AdamGleave · 2020-06-30T20:56:26Z

My question is: should we integrate it for v1.0 (#1) or should we wait?

@hill-a @Miffyli @AdamGleave @erniejunior

My inclination is to wait for 1.1 since it doesn't seem like an essential feature. If the difference between SAC and QR-DQN is minimal though then I would be fine with having them both in 1.0 if you wanted to put the time in.

araffin · 2020-07-01T08:31:27Z

My inclination is to wait for 1.1 since it doesn't seem like an essential feature. If the difference between SAC and QR-DQN is minimal though then I would be fine with having them both in 1.0 if you wanted to put the time in.

Edit: Adding this straight to master is bit iffy, given that even the basic algorithms are still under (slow) review ^^'

I agree with both of you ;) I will wait then and in the meantime, I can still use it by switching to the branch when needed.

araffin · 2020-10-17T15:14:07Z

Issue moved to contrib repo

araffin · 2020-10-22T15:02:06Z

Now available in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

araffin added the enhancement New feature or request label Jun 30, 2020

This comment has been minimized.

Sign in to view

araffin mentioned this issue Jul 1, 2020

Roadmap to Stable-Baselines3 V1.0 #1

Closed

42 tasks

araffin mentioned this issue Sep 11, 2020

[Feature request] Implement SAC-Discrete #157

Closed

araffin added the experimental Experimental Feature label Sep 20, 2020

araffin mentioned this issue Oct 17, 2020

Implement Truncated Quantile Critics (TQC) Stable-Baselines-Team/stable-baselines3-contrib#2

Closed

araffin closed this as completed Oct 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Truncated Quantile Critics (TQC) #83

Implement Truncated Quantile Critics (TQC) #83

araffin commented Jun 30, 2020

Miffyli commented Jun 30, 2020 •

edited

Loading

AdamGleave commented Jun 30, 2020

This comment has been minimized.

araffin commented Jul 1, 2020

araffin commented Oct 17, 2020

araffin commented Oct 22, 2020

Implement Truncated Quantile Critics (TQC) #83

Implement Truncated Quantile Critics (TQC) #83

Comments

araffin commented Jun 30, 2020

Background

Pros

Cons

Miffyli commented Jun 30, 2020 • edited Loading

AdamGleave commented Jun 30, 2020

This comment has been minimized.

araffin commented Jul 1, 2020

araffin commented Oct 17, 2020

araffin commented Oct 22, 2020

Miffyli commented Jun 30, 2020 •

edited

Loading