-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement HER #8
Comments
Maybe worth it to implement similar to baselines2. Create an environment wrapper to return concatenated states and goals and then wrap the buffer, so that goal relabeling is applied at the end of the episode. |
In fact, I would prefer a separate algorithm that does not require a wrapper, even though it means a bit of code duplication. The SB2 solution works but is quite messy and requires a lot of data transformation. |
I'm working on something similar. Can I take a shot at this? With DDPG (and maybe SAC) |
Yes you can, but you just have to assume the model derives from the off policy class and therefore has a replay buffer. i may also have a student on that one (cf roadmap) but her code needs to be adapted a bit. i would like also to compare performances when the transitions are added at sampling time (cf openai baselines code). |
I've created a PR for a HER implementation (#42). It's basically reusing elements from the existing SB implementation (with some minor edits). It works for the existing off-policy algos. Please check if this is usable. I'll take a shot at removing wrappers and minimizing data transformations next. This is my first PR, so I apologize if some things are not proper and do let me know if there is additional work to be done. |
Regarding adding HER transitions at sampling time, wouldn't that mean that the samples are more correlated than before, since HER samples would come from the same episode as the original transitions we sample. This, as opposed to uncorrelated samples while drawing randomly from a buffer filled with original + HER |
Thank you for the PR. |
As mentioned before, I've got a student (@megan-klaiber ) that implemented HER (minimal implementation with an old version of SB3) as a programming test a while ago. If she agrees, we could use this implementation as a base. |
Take a look at Baselines implementation. The two should be equivalent |
Yes, sure you can use the code. Unfortunately, I haven't had time to work on it yet. |
Didn't realize this. Used master as base and updated the code and PR now |
Thanks. I'll work on removing wrappers and reducing data transformations in the meantime, and continue updating. Will also look at baselines implementation for sampling |
I'll try to push @megan-klaiber implementation next week, but I won't take at look at your PR before #28 and #35 are merged. |
You can find attached @megan-klaiber implementation. What I have in mind is a mix between SB2 implementation and her implementation.
But again, please wait that #28 and #35 are merged before requesting review (that does not mean you cannot work on it). |
Thank you for the code. Do you have a timeline in mind as to when you want to get HER done? I will look at improving it until then. |
There is not fix timeline but at some point (in ~1 month and a half), we will have to use it. So, I may help you finish the implementation to get things done at that time. |
EDIT: we might want to support Dict obs for VecNormalize (maybe another issue)
The text was updated successfully, but these errors were encountered: