[feature-request] N-step returns for TD methods #47

araffin · 2020-06-08T12:36:46Z

Originally posted by @partiallytyped in hill-a/stable-baselines#821
"
N-step returns allow for much better stability, and improve performance when training DQN, DDPG etc, so it will be quite useful to have this feature.

A simple implementation of this would be as a wrapper around ReplayBuffer so it would work with both Prioritized and Uniform sampling. The wrapper keeps a queue of observed experiences compute the returns and add the experience to the buffer.
"

Roadmap: v1.1+ (see #1 )

araffin · 2020-06-08T12:38:49Z

@partiallytyped I thought about that one, and we just need to change the sampling not the storage, no? (as a first approximation)

What I mean: at sampling time, we could re-create the trajectory (until a done is found or the buffer ends) by simply going through the indexes.

m-rph · 2020-06-09T12:21:20Z

This approach sounds better than what I initially came up with, seems to have fewer moving parts and will be easier to reason about. I will get on it once V1.0 is released.

m-rph · 2020-06-19T10:59:43Z

How would you like this to be implemented? As a wrapper around the buffer, as a derived class from the buffer, or as it's own object that adheres to the buffer API?

araffin · 2020-06-19T11:15:58Z

A class that derives from the replay buffer class seems the natural option I would say.

araffin · 2021-07-23T20:00:04Z

As an update, I have an experimental version of SAC + Peng Q-Lambda in the contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/tree/feat/peng-q-lambda
I'm using an adapted version of the HER replay buffer (storing things by episodes) which can probably easily be updated for an n-step buffer (in fact, lambda=1 is the n-step version).
I also had to hack a bit SAC in order to have access to actor and target q-value.

Original repo by @robintyh1: https://github.com/robintyh1/icml2021-pengqlambda

Add callback support

araffin added the enhancement New feature or request label Jun 8, 2020

m-rph mentioned this issue Jun 8, 2020

Potential continuity bug in the replay buffer when calling .learn multiple times #46

Closed

araffin mentioned this issue Jun 9, 2020

Roadmap to Stable-Baselines3 V1.0 #1

Closed

42 tasks

araffin added this to the v1.1 milestone Jun 9, 2020

m-rph mentioned this issue Jun 30, 2020

N-step updates for off-policy methods #81

Closed

12 tasks

araffin removed this from the v1.1 milestone Nov 4, 2021

Shunian-Chen pushed a commit to Shunian-Chen/AIPI530 that referenced this issue Nov 14, 2021

Merge pull request DLR-RM#47 from Antonin-Raffin/feat/callbacks

0143518

Add callback support

araffin mentioned this issue May 6, 2024

Prioritized experience replay #1622

Open

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature-request] N-step returns for TD methods #47

[feature-request] N-step returns for TD methods #47

araffin commented Jun 8, 2020

araffin commented Jun 8, 2020

m-rph commented Jun 9, 2020

m-rph commented Jun 19, 2020 •

edited

Loading

araffin commented Jun 19, 2020 •

edited

Loading

araffin commented Jul 23, 2021

[feature-request] N-step returns for TD methods #47

[feature-request] N-step returns for TD methods #47

Comments

araffin commented Jun 8, 2020

araffin commented Jun 8, 2020

m-rph commented Jun 9, 2020

m-rph commented Jun 19, 2020 • edited Loading

araffin commented Jun 19, 2020 • edited Loading

araffin commented Jul 23, 2021

m-rph commented Jun 19, 2020 •

edited

Loading

araffin commented Jun 19, 2020 •

edited

Loading