-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Double DQN #487
Comments
Yup indeed this is one of the core enhancements we would like to see! @araffin just checking with you: is bloating DQN code here slightly ok (with a new parameter), or should this go to contrib? @NickLucche If you wish to work on this PR, note the we would like to see matching results with reference implementations (e.g. same results as in the original paper or in a well-established codebase). One or two Atari games should be enough. |
Sure thing, I'll make sure to setup the experiments in a similar way. |
I highly recommend using zoo for doing that btw, since it already has known-to-work preprocessing for Atari :). It takes a moment to get into, but will definitely pay itself off with these experiments (there are many tiny details that need taking of). |
I think double DQN is ok as it is only a 3 lines change. |
@Miffyli Thanks for the tip. It may take a while to run the experiments for the same number of timesteps as in "Human-level control through deep reinforcement learning" tho. |
I could help with that if needed but basically you can check what we did with QR-DQN: Stable-Baselines-Team/stable-baselines3-contrib#13 So Breakout + Pong using 10e6 training steps and 3 seeds (40e6 frames because of frame skip) + some benchmark on simple gym task. |
Thanks a lot for your help, it is most appreciated! I was using a script very similar to this for testing so far Stable-Baselines-Team/stable-baselines3-contrib#13 (comment) as I'm yet not super-familiar with zoo. |
RL Zoo is quite easy to use, if you want to replicate results and plot them, we have a section for that in the documentation ;) |
Sorry for the long inactivity. Vanilla DQN PongVanilla DQN BreakoutDouble DQN PongDouble DQN Breakout |
thanks for the update, best would be to have both dqn and double dqn in the same plot (i can help you with the command if needed). |
Oh yeah I'd appreciate that, I've used the Sure thing, I'll make sure to test it on a few more interesting use-cases. |
cf. https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html#how-to-replicate-the-results you need to plot the evaluations, for instance comparing dqn and ppo on Pong and Breakout:
(you can also specify multiple folders) |
As a lecturer in an AI program, I recommend our students to use Stable Baselines for their projects due to ease of use and clear documentation (thx for that!). From an educational point of Q-learning and DQN are a good introduction to RL, so students start off using DQN. Results using DQN of SB3 are much, much worse compared to SB2 (both with default values for the parameters). This hampers the adoption of SB3 (and the enthusiasm of students for RL). I have not yet understood/investigated the reason of this difference. Obvious candidates are the missing extensions like PER and DDQN, but of course this is an assumption. Goal of this comment is just to mention that progress in SB3 on this topic is much appreciated. If I can be of help, for example in testing improvements, let me know. Best regards, Erco Argante |
Thank you for the kind words! You are right the DQN augmentations do contribute in the performance, however they should not be "make or break" level changes across games. You need to check the original publications for how much the augmentation affects in the specific game you are using (it varies a lot). Another common point are the hyperparameters. I think we changed some of the default training parameters between SB and SB3, so you might want to double-check that they are valid. Once those are set right (and you disable the augmentations of SB DQN), you should be getting equal-ish performance: #110 Another source for disparency may be the logging of rewards, where the reported performance reflects a different number (see e.g., #181 ). I do not think we included fix for that specific issue in SB. TL;DR the augmentations definitely contribute here, but I would double-check the parameters and evaluation procedure (these two are points we bring up often to others seeking advice) :) |
Hello,
As mentioned in the doc, it is highly recommended to use the RL Zoo (https://github.com/DLR-RM/rl-baselines3-zoo) because default hyperparameters are tuned for Atari games only. And you should compare apples to apples... (that's what we did in #110 where we deactivated SB2 DQN extensions and match hyperparameters)
In case you want to use a more recent algorithm, we have QR-DQN in our contrib repo.
My idea would be to have a RAINBOW implementation in SB3 to keep DQN simple but have all the tricks in one algorithm. I will open an issue (and contribution are welcomed ;)). |
Hi, |
Could you try training with VideoPinball environment (has the highest gap between DQN and DDQN according to the original paper)? I also recommend trying to do the runs with sb3-zoo, as it automatically applies all necessary wrappers for proper evaluation (this can easily go wrong). |
closing in favor of #622 |
🚀 Feature
Add double variant of the dqn algorithm.
Motivation
It's in the roadmap #1.
Pitch
I suggest we go from:
to:
with
double_dqn
as additional flag to be passed to DQN init.### Checklist
The text was updated successfully, but these errors were encountered: