Roadmap to Stable-Baselines3 V1.0 #1

araffin · 2020-05-08T15:03:43Z

This issue is meant to be updated as the list of changes is not exhaustive

Dear all,

Stable-Baselines3 beta is now out 🎉 ! This issue is meant to reference what is implemented and what is missing before a first major version.

As mentioned in the README, before v1.0, breaking changes may occur. I would like to encourage contributors (especially the maintainers) to make comments on how to improve the library before v1.0 (and maybe make some internal changes).

I will try to review the features mentioned in hill-a/stable-baselines#576 (and hill-a/stable-baselines#733)
and I will create issues soon to reference what is missing.

What is implemented?

What are the new features?

What is missing?

syncing some files with Stable-Baselines to remain consistent (we may be good now, but need to be checked)
finish code-review of exisiting code Review of Existing Code #17

Checklist for v1.0 release

Update Readme
Prepare blog post
Update doc: add links to the stable-baselines3 contrib
Update docker image to use newer Ubuntu version
Populate RL zoo

What is next? (for V1.1+)

basic dict/tuple support for observations (Dictionary Observations #243 )
simple recurrent policies? (recurrent policy implementation in ppo [feature-request] #18)
DQN extensions (double, PER, IQN) ([Feature Request] RAINBOW #622)
Implement TRPO (Add TRPO Stable-Baselines-Team/stable-baselines3-contrib#40)
multi-worker training for all algorithms ([Feature request] Adding multiprocessing support for off policy algorithms #179 )
n-step returns for off-policy algorithms [feature-request] N-step returns for TD methods #47 (@partiallytyped )
SAC discrete [Feature request] Implement SAC-Discrete #157 (need to be discussed, benefit vs DQN+extensions?)
Energy Based Prioritisation? (@RyanRizzo96)
implement action_proba in the base class?
test the doc snippets Sphinx doc tests support #14 (help is welcomed)
noisy networks (https://arxiv.org/abs/1706.10295) @partiallytyped ? exploration in parameter space? ([Feature Request] RAINBOW #622)
Munchausen Reinforcement Learning (MDQN) (probably in the contrib first, e.g. [WIP] MDQN pfnet/pfrl#74)

side note: should we change the default start_method to fork? (now that we don't have tf anymore)

The text was updated successfully, but these errors were encountered:

m-rph · 2020-05-08T16:55:31Z

Maybe N-step returns for TD3 (and DDPG) and DQN (and friends) ? If it's implemented in the experience replay, then it is likely plug and play for TD3 and DQN, an implementation for SAC probably requires extra effort.

Perhaps at a later time, e.g. V1.1+ retrace, tree backup, Q(lambda), Importance Sampling for n-step returns?
If retrace and friends are planned for later, then it should be taken into consideration when implementing n-steps.

Miffyli · 2020-05-08T16:57:15Z

@partiallytyped

Yup, that would be v1.1 thing but indeed planned. Should probably go over original SB issues to gather all these suggestions at some point.

m-rph · 2020-05-08T17:53:23Z

Perhaps a discrete version of SAC for v1.1+?
https://arxiv.org/abs/1910.07207

Edit: I can implement this, and add types to the remaining methods after my finals (early June).

rolandgvc · 2020-05-08T18:34:20Z

I will start working on the additional observation/action spaces this weekend 👍

jkterry1 · 2020-05-10T05:51:19Z

Will the stable baselines 3 repo/package replace the existing stable baselines one, or will all this eventually be merged into the normal stable baselines repo?

Miffyli · 2020-05-10T09:52:01Z

@justinkterry

There are no plans to merge/combine the two repositories. Stable-baselines will continue to exist, and continue to receive bug-fixes and the like for some time before it is archived.

jkterry1 · 2020-05-10T16:16:53Z

@Miffyli Thank you. Will it remain as "pip3 install stable-baselines," or become something like "pip3 install stable-baselines3"?

Miffyli · 2020-05-10T16:21:13Z

@justinkterry

You can already install sb3 with pip3 install stable-baselines3. The original repo will stay as pip3 install stable-baselines.

AdamGleave · 2020-05-10T23:05:24Z

Minor point but I wonder if we should rename BaseRLModel to BaseRLAlgorithm and BasePolicy to BaseModel, given that BasePolicy is more than just a policy?

araffin · 2020-05-11T07:32:08Z

Minor point but I wonder if we should rename BaseRLModel to BaseRLAlgorithm and BasePolicy to BaseModel, given that BasePolicy is more than just a policy?

Good point, BaseModel and BaseRLAlgorithm are definitely better names ;)

jdily · 2020-05-11T10:04:20Z

for visualization, probably using something like weights & biases (https://www.wandb.com/) is an option?
So that no need for tensorflow dependency.
I can help to add functions to do that.

araffin · 2020-05-11T10:39:09Z

for visualization, probably using something like weights & biases (https://www.wandb.com/) is an option?

correct me if I'm wrong but W&B does not work offline, no? This is really important as you don't want your results to be published when you do private work.

This could be also implemented either as a callback (cf doc) or a new output for the logger. But sounds more like a "contrib" module to me.

m-rph · 2020-05-11T12:19:06Z

Perhaps an official shorthand for stable-baselines and stable-baselines3 e.g. sb and sb3?

import stable_baselines3 as sb3

ManifoldFR · 2020-05-12T12:30:26Z

Is it necessary to continue to provide the interface for vectorized environments inside of this codebase?
They were contributed upstream back to gym in this PR. After that PR was merged, packages such as PyBullet (pybullet_envs) started providing vectorized variants of their own environments using the interface from gym which should be the same as the one here (for now)

Miffyli · 2020-05-12T12:37:08Z

@ManifoldFR Somehow that has eluded my attention. Looks like a good suggestion! Less repeated code is better, as long it fits in stable-baselines functions too.

@araffin thoughts (you have most experience doing the eval/wrap functions)? I imagine the hardest part is to update all wrappers that work on vectorized environments.

ManifoldFR · 2020-05-12T12:42:20Z

I happened onto it by chance because it's not documented anywhere inside of gym's docs, the openai people ported it from their own baselines repo with barely any notification of the change to end users.

araffin · 2020-05-12T12:49:36Z

I was aware of this (wrote some comments at that time https://github.com/openai/gym/pull/1513/files#r293899941) but I would argue against for different reasons:

we rely on some specific features (set_attr, get_attr)
openai version is undocumented and we don't know if they gonna break that feature (which is central in SB3) in a future release (I don't want to write new monkey patch like hill-a/stable-baselines@678f803)
we can directly tweak that feature to fit our needs (and don't wait for a review and release by OpenAI)

So, in short, I would be in favor only if OpenAI way of maintaining Gym was more reliable.

PS: thanks @ManifoldFR for bringing that up ;)

ManifoldFR · 2020-05-12T13:04:14Z

I see, it figures that OpenAI changing things unilaterally without documentation would be a problem. I guess ensuring stable-baselines3's code doesn't break when running vectorized envs derived from Gym's implementation would be safer and easier instead...

cosmir17 · 2020-12-07T09:20:49Z

@partiallytyped I think you are already aware but I am also mentioning here. I found a source code example for the original SAC Discrete Implementation paper (the one you found).

The author also publicised his code.
https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/blob/master/agents/actor_critic_agents/SAC_Discrete.py

Hope these help,
Sean

araffin · 2020-12-07T09:27:28Z

I found the original example for the SAC Discrete Implementation plan.
Can the following paper be considered?
https://arxiv.org/abs/1910.07207

We already have an issue for that #157

cosmir17 · 2020-12-07T09:42:18Z

@araffin ~~I wasn't asking if we can implement it. Given that it has been decided as shown on the load-map and PartialTyped volunteered here, I was giving him or the team a resource. #1 (comment)~~
Actually, I missed the footnote
need to be discussed, benefit vs DQN+extensions?
I am posting my suggestion in the page now.

araffin · 2021-03-05T19:25:26Z

First release candidate is out: https://github.com/DLR-RM/stable-baselines3/releases/tag/v1.0rc0
100+ trained rl models will be published soon: DLR-RM/rl-baselines3-zoo#69

Add A2C algorithm

Fix tests

As suggested by Antonin Raffin <antonin.raffin@ensta.org>.

* Fix env checker single-step-env edge case Before this change, env checker failed to `reset()` the tested environment before calling `step()` when checking for `Inf` / `NaN`. This could cause environments which happened to have only one `step()` available before the episode was terminated to fail. This is now fixed. * Code review fixes #1 As suggested by Antonin Raffin <antonin.raffin@ensta.org>.

Merged to master for convenience

araffin added the enhancement New feature or request label May 8, 2020

araffin added this to the v1.0 milestone May 8, 2020

araffin assigned ernestum, AdamGleave, Miffyli, araffin and hill-a May 8, 2020

araffin pinned this issue May 8, 2020

This was referenced May 9, 2020

[Suggestion for V3] All RL algorithms should behave like current DDPG and automatically normalize input features hill-a/stable-baselines#773

Closed

[question] HER and prioritized experience replay hill-a/stable-baselines#751

Closed

Add Gitlab CI #12

Merged

araffin mentioned this issue May 12, 2020

recurrent policy implementation in ppo [feature-request] #18

Closed

cosmir17 mentioned this issue Dec 7, 2020

[Feature request] Implement SAC-Discrete #157

Closed

araffin mentioned this issue Feb 28, 2021

Beta is over =)! V1.0rc0 #334

Merged

14 tasks

araffin closed this as completed in #334 Mar 1, 2021

araffin unpinned this issue Mar 5, 2021

This comment has been minimized.

Sign in to view

araffin mentioned this issue Jun 7, 2021

[Feature Request] TRPO needed #467

Closed

1 task

Miffyli mentioned this issue Jun 16, 2021

[Feature Request] Bringing wandb logging to sb3 #480

Closed

1 task

NickLucche mentioned this issue Jun 22, 2021

[Feature Request] Double DQN #487

Closed

tristandeleu mentioned this issue Jul 27, 2021

Plans for Future Maintenance of Gym openai/gym#2259

Closed

Shunian-Chen pushed a commit to Shunian-Chen/AIPI530 that referenced this issue Nov 14, 2021

Merge pull request DLR-RM#1 from Antonin-Raffin/feat/a2c

701daa8

Add A2C algorithm

YannBerthelot mentioned this issue Jan 18, 2022

[Bug] Tensorboard logging not logging every log_interval timesteps #725

Closed

3 tasks

ThomasRochefortB mentioned this issue Apr 14, 2022

[Bug] GPU memory explodes when using Conv2D layers in Dict Observations FeatureExtractor #863

Closed

rajcscw mentioned this issue Apr 15, 2022

[Bug] Local variable 'values' not updated in the callback for the last timestep #864

Closed

3 tasks

francescomaldonato mentioned this issue Jun 13, 2022

[Bug] check_env() output error on the reset() method, observation space not matching #932

Closed

4 tasks

vcadillog mentioned this issue Sep 6, 2022

[Feature Request] Add logger.close to StopTrainingOnMaxEpisodes #1049

Open

1 task

Tuxliri mentioned this issue Dec 12, 2022

[Bug]: difference in output of model exported to onnx #1211

Closed

4 tasks

qgallouedec pushed a commit that referenced this issue Jan 24, 2023

Merge pull request #1 from carlosluis/fix_tests

0851440

Fix tests

lutogniew added a commit to lutogniew/stable-baselines3 that referenced this issue May 25, 2023

Code review fixes DLR-RM#1

5a2cde7

As suggested by Antonin Raffin <antonin.raffin@ensta.org>.

JoshuaClouse mentioned this issue Mar 6, 2024

[Feature Request] Raise error when same object in memory passed to vectorized environment #1151

Closed

1 task

RaikoPipe referenced this issue in RaikoPipe/stable-baselines3 Oct 21, 2024

Merge pull request #1 from RaikoPipe/fix_tests

19f091f

Merged to master for convenience

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap to Stable-Baselines3 V1.0 #1

Roadmap to Stable-Baselines3 V1.0 #1

araffin commented May 8, 2020 •

edited

Loading

m-rph commented May 8, 2020 •

edited

Loading

Miffyli commented May 8, 2020

m-rph commented May 8, 2020 •

edited

Loading

rolandgvc commented May 8, 2020

jkterry1 commented May 10, 2020 •

edited

Loading

Miffyli commented May 10, 2020

jkterry1 commented May 10, 2020

Miffyli commented May 10, 2020 •

edited

Loading

AdamGleave commented May 10, 2020 •

edited

Loading

araffin commented May 11, 2020

jdily commented May 11, 2020 •

edited

Loading

araffin commented May 11, 2020

m-rph commented May 11, 2020 •

edited

Loading

ManifoldFR commented May 12, 2020

Miffyli commented May 12, 2020 •

edited

Loading

ManifoldFR commented May 12, 2020

araffin commented May 12, 2020 •

edited

Loading

ManifoldFR commented May 12, 2020 via email •

edited

Loading

cosmir17 commented Dec 7, 2020 •

edited

Loading

araffin commented Dec 7, 2020

cosmir17 commented Dec 7, 2020 •

edited

Loading

araffin commented Mar 5, 2021

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Roadmap to Stable-Baselines3 V1.0 #1

Roadmap to Stable-Baselines3 V1.0 #1

Comments

araffin commented May 8, 2020 • edited Loading

What is implemented?

What are the new features?

What is missing?

Checklist for v1.0 release

What is next? (for V1.1+)

m-rph commented May 8, 2020 • edited Loading

Miffyli commented May 8, 2020

m-rph commented May 8, 2020 • edited Loading

rolandgvc commented May 8, 2020

jkterry1 commented May 10, 2020 • edited Loading

Miffyli commented May 10, 2020

jkterry1 commented May 10, 2020

Miffyli commented May 10, 2020 • edited Loading

AdamGleave commented May 10, 2020 • edited Loading

araffin commented May 11, 2020

jdily commented May 11, 2020 • edited Loading

araffin commented May 11, 2020

m-rph commented May 11, 2020 • edited Loading

ManifoldFR commented May 12, 2020

Miffyli commented May 12, 2020 • edited Loading

ManifoldFR commented May 12, 2020

araffin commented May 12, 2020 • edited Loading

ManifoldFR commented May 12, 2020 via email • edited Loading

cosmir17 commented Dec 7, 2020 • edited Loading

araffin commented Dec 7, 2020

cosmir17 commented Dec 7, 2020 • edited Loading

araffin commented Mar 5, 2021

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

araffin commented May 8, 2020 •

edited

Loading

m-rph commented May 8, 2020 •

edited

Loading

m-rph commented May 8, 2020 •

edited

Loading

jkterry1 commented May 10, 2020 •

edited

Loading

Miffyli commented May 10, 2020 •

edited

Loading

AdamGleave commented May 10, 2020 •

edited

Loading

jdily commented May 11, 2020 •

edited

Loading

m-rph commented May 11, 2020 •

edited

Loading

Miffyli commented May 12, 2020 •

edited

Loading

araffin commented May 12, 2020 •

edited

Loading

ManifoldFR commented May 12, 2020 via email •

edited

Loading

cosmir17 commented Dec 7, 2020 •

edited

Loading

cosmir17 commented Dec 7, 2020 •

edited

Loading