Implement HER #8

araffin · 2020-05-09T11:18:58Z

EDIT: we might want to support Dict obs for VecNormalize (maybe another issue)

ferreirajoaouerj · 2020-05-10T22:01:50Z

Maybe worth it to implement similar to baselines2. Create an environment wrapper to return concatenated states and goals and then wrap the buffer, so that goal relabeling is applied at the end of the episode.

araffin · 2020-05-11T14:54:35Z

Maybe worth it to implement similar to baselines2. Create an environment wrapper to return concatenated states and goals and then wrap the buffer, so that goal relabeling is applied at the end of the episode.

In fact, I would prefer a separate algorithm that does not require a wrapper, even though it means a bit of code duplication. The SB2 solution works but is quite messy and requires a lot of data transformation.

arjun-kg · 2020-05-18T16:46:01Z

I'm working on something similar. Can I take a shot at this? With DDPG (and maybe SAC)

araffin · 2020-05-18T21:18:04Z

Yes you can, but you just have to assume the model derives from the off policy class and therefore has a replay buffer.
it should not be specific to td3/dqn/sac.

i may also have a student on that one (cf roadmap) but her code needs to be adapted a bit. i would like also to compare performances when the transitions are added at sampling time (cf openai baselines code).

arjun-kg · 2020-06-03T18:14:19Z

I've created a PR for a HER implementation (#42). It's basically reusing elements from the existing SB implementation (with some minor edits). It works for the existing off-policy algos. Please check if this is usable. I'll take a shot at removing wrappers and minimizing data transformations next. This is my first PR, so I apologize if some things are not proper and do let me know if there is additional work to be done.

arjun-kg · 2020-06-03T18:15:10Z

Regarding adding HER transitions at sampling time, wouldn't that mean that the samples are more correlated than before, since HER samples would come from the same episode as the original transitions we sample. This, as opposed to uncorrelated samples while drawing randomly from a buffer filled with original + HER

araffin · 2020-06-03T18:34:20Z

Thank you for the PR.
I wanted to avoid doing an implementation as in SB2... Also it seems that you used the wrong branch as base. Maybe wait for the dqn to be merged first.

araffin · 2020-06-03T18:39:05Z

As mentioned before, I've got a student (@megan-klaiber ) that implemented HER (minimal implementation with an old version of SB3) as a programming test a while ago. If she agrees, we could use this implementation as a base.
Unfortunately, her contract did not start yet...

araffin · 2020-06-03T18:43:07Z

Regarding adding HER transitions at sampling time, wouldn't that mean that the samples are more correlated than before, since HER samples would come from the same episode as the original transitions we sample. This, as opposed to uncorrelated samples while drawing randomly from a buffer filled with original + HER

Take a look at Baselines implementation. The two should be equivalent

megan-klaiber · 2020-06-03T22:21:20Z

As mentioned before, I've got a student (@megan-klaiber ) that implemented HER (minimal implementation with an old version of SB3) as a programming test a while ago. If she agrees, we could use this implementation as a base.
Unfortunately, her contract did not start yet...

Yes, sure you can use the code. Unfortunately, I haven't had time to work on it yet.

arjun-kg · 2020-06-04T19:51:18Z

Thank you for the PR.
I wanted to avoid doing an implementation as in SB2... Also it seems that you used the wrong branch as base. Maybe wait for the dqn to be merged first.

Didn't realize this. Used master as base and updated the code and PR now

arjun-kg · 2020-06-04T19:57:42Z

As mentioned before, I've got a student (@megan-klaiber ) that implemented HER (minimal implementation with an old version of SB3) as a programming test a while ago. If she agrees, we could use this implementation as a base.
Unfortunately, her contract did not start yet...

Yes, sure you can use the code. Unfortunately, I haven't had time to work on it yet.

Thanks. I'll work on removing wrappers and reducing data transformations in the meantime, and continue updating. Will also look at baselines implementation for sampling

araffin · 2020-06-05T08:05:13Z

Thanks. I'll work on removing wrappers and reducing data transformations in the meantime, and continue updating. Will also look at baselines implementation for sampling

I'll try to push @megan-klaiber implementation next week, but I won't take at look at your PR before #28 and #35 are merged.
In the meantime, make sure to read the contributing guide and that all the tests passes.

araffin · 2020-06-08T12:14:35Z

You can find attached @megan-klaiber implementation. What I have in mind is a mix between SB2 implementation and her implementation.
To be more precise:

I would minimize transformations and avoid using a wrapper for the buffer, even if it means a bit of code duplication
I would keep the enum that was present in SB2 to define the type of goals
for the env wrapper, I would prefer a VecEnvWrapper instead of a gym.Wrapper, this would allow multiprocessing in the future. Also, once with have dict support for observation space (cf roadmap Roadmap to Stable-Baselines3 V1.0 #1 ), we will be able to remove that wrapper.

But again, please wait that #28 and #35 are merged before requesting review (that does not mean you cannot work on it).

her_megan_solution.zip

arjun-kg · 2020-06-09T06:17:03Z

Thank you for the code. Do you have a timeline in mind as to when you want to get HER done? I will look at improving it until then.

araffin · 2020-06-09T07:21:54Z

There is not fix timeline but at some point (in ~1 month and a half), we will have to use it. So, I may help you finish the implementation to get things done at that time.

araffin added the enhancement New feature or request label May 9, 2020

araffin added this to the v1.0 milestone May 9, 2020

araffin mentioned this issue May 9, 2020

Roadmap to Stable-Baselines3 V1.0 #1

Closed

42 tasks

megan-klaiber mentioned this issue Jul 23, 2020

Implement HER #120

Merged

19 tasks

araffin closed this as completed in #120 Oct 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement HER #8

Implement HER #8

araffin commented May 9, 2020 •

edited

Loading

ferreirajoaouerj commented May 10, 2020

araffin commented May 11, 2020

arjun-kg commented May 18, 2020

araffin commented May 18, 2020

arjun-kg commented Jun 3, 2020

arjun-kg commented Jun 3, 2020

araffin commented Jun 3, 2020 •

edited

Loading

araffin commented Jun 3, 2020 •

edited

Loading

araffin commented Jun 3, 2020

megan-klaiber commented Jun 3, 2020

arjun-kg commented Jun 4, 2020

arjun-kg commented Jun 4, 2020

araffin commented Jun 5, 2020

araffin commented Jun 8, 2020

arjun-kg commented Jun 9, 2020

araffin commented Jun 9, 2020

Implement HER #8

Implement HER #8

Comments

araffin commented May 9, 2020 • edited Loading

ferreirajoaouerj commented May 10, 2020

araffin commented May 11, 2020

arjun-kg commented May 18, 2020

araffin commented May 18, 2020

arjun-kg commented Jun 3, 2020

arjun-kg commented Jun 3, 2020

araffin commented Jun 3, 2020 • edited Loading

araffin commented Jun 3, 2020 • edited Loading

araffin commented Jun 3, 2020

megan-klaiber commented Jun 3, 2020

arjun-kg commented Jun 4, 2020

arjun-kg commented Jun 4, 2020

araffin commented Jun 5, 2020

araffin commented Jun 8, 2020

arjun-kg commented Jun 9, 2020

araffin commented Jun 9, 2020

araffin commented May 9, 2020 •

edited

Loading

araffin commented Jun 3, 2020 •

edited

Loading

araffin commented Jun 3, 2020 •

edited

Loading