Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap to Stable-Baselines3 V1.0 #1

Closed
42 tasks done
araffin opened this issue May 8, 2020 · 46 comments · Fixed by #334
Closed
42 tasks done

Roadmap to Stable-Baselines3 V1.0 #1

araffin opened this issue May 8, 2020 · 46 comments · Fixed by #334
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@araffin
Copy link
Member

araffin commented May 8, 2020

This issue is meant to be updated as the list of changes is not exhaustive

Dear all,

Stable-Baselines3 beta is now out 🎉 ! This issue is meant to reference what is implemented and what is missing before a first major version.

As mentioned in the README, before v1.0, breaking changes may occur. I would like to encourage contributors (especially the maintainers) to make comments on how to improve the library before v1.0 (and maybe make some internal changes).

I will try to review the features mentioned in hill-a/stable-baselines#576 (and hill-a/stable-baselines#733)
and I will create issues soon to reference what is missing.

What is implemented?

What are the new features?

What is missing?

  • syncing some files with Stable-Baselines to remain consistent (we may be good now, but need to be checked)
  • finish code-review of exisiting code Review of Existing Code #17

Checklist for v1.0 release

  • Update Readme
  • Prepare blog post
  • Update doc: add links to the stable-baselines3 contrib
  • Update docker image to use newer Ubuntu version
  • Populate RL zoo

What is next? (for V1.1+)

side note: should we change the default start_method to fork? (now that we don't have tf anymore)

@araffin araffin added the enhancement New feature or request label May 8, 2020
@araffin araffin added this to the v1.0 milestone May 8, 2020
@araffin araffin pinned this issue May 8, 2020
@m-rph
Copy link
Contributor

m-rph commented May 8, 2020

Maybe N-step returns for TD3 (and DDPG) and DQN (and friends) ? If it's implemented in the experience replay, then it is likely plug and play for TD3 and DQN, an implementation for SAC probably requires extra effort.

Perhaps at a later time, e.g. V1.1+ retrace, tree backup, Q(lambda), Importance Sampling for n-step returns?
If retrace and friends are planned for later, then it should be taken into consideration when implementing n-steps.

@Miffyli
Copy link
Collaborator

Miffyli commented May 8, 2020

@partiallytyped

Yup, that would be v1.1 thing but indeed planned. Should probably go over original SB issues to gather all these suggestions at some point.

@m-rph
Copy link
Contributor

m-rph commented May 8, 2020

Perhaps a discrete version of SAC for v1.1+?
https://arxiv.org/abs/1910.07207

Edit: I can implement this, and add types to the remaining methods after my finals (early June).

@rolandgvc
Copy link
Contributor

I will start working on the additional observation/action spaces this weekend 👍

@jkterry1
Copy link
Contributor

jkterry1 commented May 10, 2020

Will the stable baselines 3 repo/package replace the existing stable baselines one, or will all this eventually be merged into the normal stable baselines repo?

@Miffyli
Copy link
Collaborator

Miffyli commented May 10, 2020

@justinkterry

There are no plans to merge/combine the two repositories. Stable-baselines will continue to exist, and continue to receive bug-fixes and the like for some time before it is archived.

@jkterry1
Copy link
Contributor

@Miffyli Thank you. Will it remain as "pip3 install stable-baselines," or become something like "pip3 install stable-baselines3"?

@Miffyli
Copy link
Collaborator

Miffyli commented May 10, 2020

@justinkterry

You can already install sb3 with pip3 install stable-baselines3. The original repo will stay as pip3 install stable-baselines.

@AdamGleave
Copy link
Collaborator

AdamGleave commented May 10, 2020

Minor point but I wonder if we should rename BaseRLModel to BaseRLAlgorithm and BasePolicy to BaseModel, given that BasePolicy is more than just a policy?

@araffin
Copy link
Member Author

araffin commented May 11, 2020

Minor point but I wonder if we should rename BaseRLModel to BaseRLAlgorithm and BasePolicy to BaseModel, given that BasePolicy is more than just a policy?

Good point, BaseModel and BaseRLAlgorithm are definitely better names ;)

@jdily
Copy link

jdily commented May 11, 2020

for visualization, probably using something like weights & biases (https://www.wandb.com/) is an option?
So that no need for tensorflow dependency.
I can help to add functions to do that.

@araffin
Copy link
Member Author

araffin commented May 11, 2020

for visualization, probably using something like weights & biases (https://www.wandb.com/) is an option?

correct me if I'm wrong but W&B does not work offline, no? This is really important as you don't want your results to be published when you do private work.

This could be also implemented either as a callback (cf doc) or a new output for the logger. But sounds more like a "contrib" module to me.

@m-rph
Copy link
Contributor

m-rph commented May 11, 2020

Perhaps an official shorthand for stable-baselines and stable-baselines3 e.g. sb and sb3?

import stable_baselines3 as sb3

@ManifoldFR
Copy link
Contributor

Is it necessary to continue to provide the interface for vectorized environments inside of this codebase?
They were contributed upstream back to gym in this PR. After that PR was merged, packages such as PyBullet (pybullet_envs) started providing vectorized variants of their own environments using the interface from gym which should be the same as the one here (for now)

@Miffyli
Copy link
Collaborator

Miffyli commented May 12, 2020

@ManifoldFR Somehow that has eluded my attention. Looks like a good suggestion! Less repeated code is better, as long it fits in stable-baselines functions too.

@araffin thoughts (you have most experience doing the eval/wrap functions)? I imagine the hardest part is to update all wrappers that work on vectorized environments.

@ManifoldFR
Copy link
Contributor

I happened onto it by chance because it's not documented anywhere inside of gym's docs, the openai people ported it from their own baselines repo with barely any notification of the change to end users.

@araffin
Copy link
Member Author

araffin commented May 12, 2020

I was aware of this (wrote some comments at that time https://github.com/openai/gym/pull/1513/files#r293899941) but I would argue against for different reasons:

  • we rely on some specific features (set_attr, get_attr)
  • openai version is undocumented and we don't know if they gonna break that feature (which is central in SB3) in a future release (I don't want to write new monkey patch like hill-a/stable-baselines@678f803)
  • we can directly tweak that feature to fit our needs (and don't wait for a review and release by OpenAI)

So, in short, I would be in favor only if OpenAI way of maintaining Gym was more reliable.

PS: thanks @ManifoldFR for bringing that up ;)

@ManifoldFR
Copy link
Contributor

ManifoldFR commented May 12, 2020 via email

@cosmir17
Copy link

cosmir17 commented Dec 7, 2020

@partiallytyped I think you are already aware but I am also mentioning here. I found a source code example for the original SAC Discrete Implementation paper (the one you found).

The author also publicised his code.
https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/blob/master/agents/actor_critic_agents/SAC_Discrete.py

Hope these help,
Sean

@araffin
Copy link
Member Author

araffin commented Dec 7, 2020

I found the original example for the SAC Discrete Implementation plan.
Can the following paper be considered?
https://arxiv.org/abs/1910.07207

We already have an issue for that #157

@cosmir17
Copy link

cosmir17 commented Dec 7, 2020

@araffin I wasn't asking if we can implement it. Given that it has been decided as shown on the load-map and PartialTyped volunteered here, I was giving him or the team a resource. #1 (comment)
Actually, I missed the footnote
need to be discussed, benefit vs DQN+extensions?
I am posting my suggestion in the page now.

@araffin
Copy link
Member Author

araffin commented Mar 5, 2021

First release candidate is out: https://github.com/DLR-RM/stable-baselines3/releases/tag/v1.0rc0
100+ trained rl models will be published soon: DLR-RM/rl-baselines3-zoo#69

@araffin araffin unpinned this issue Mar 5, 2021
@ziegenbalg

This comment has been minimized.

@Miffyli

This comment has been minimized.

@ziegenbalg

This comment has been minimized.

@Miffyli

This comment has been minimized.

@ziegenbalg

This comment has been minimized.

@Miffyli

This comment has been minimized.

Shunian-Chen pushed a commit to Shunian-Chen/AIPI530 that referenced this issue Nov 14, 2021
qgallouedec pushed a commit that referenced this issue Jan 24, 2023
lutogniew added a commit to lutogniew/stable-baselines3 that referenced this issue May 25, 2023
As suggested by Antonin Raffin <antonin.raffin@ensta.org>.
araffin pushed a commit that referenced this issue May 25, 2023
* Fix env checker single-step-env edge case

Before this change, env checker failed to `reset()` the tested
environment before calling `step()` when checking for `Inf` / `NaN`.
This could cause environments which happened to have only one `step()`
available before the episode was terminated to fail.

This is now fixed.

* Code review fixes #1

As suggested by Antonin Raffin <antonin.raffin@ensta.org>.
RaikoPipe added a commit to RaikoPipe/stable-baselines3 that referenced this issue Oct 21, 2024
Merged to master for convenience
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.