-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmap to Stable-Baselines3 V1.0 #1
Comments
Maybe N-step returns for TD3 (and DDPG) and DQN (and friends) ? If it's implemented in the experience replay, then it is likely plug and play for TD3 and DQN, an implementation for SAC probably requires extra effort. Perhaps at a later time, e.g. V1.1+ retrace, tree backup, Q(lambda), Importance Sampling for n-step returns? |
Yup, that would be v1.1 thing but indeed planned. Should probably go over original SB issues to gather all these suggestions at some point. |
Perhaps a discrete version of SAC for v1.1+? Edit: I can implement this, and add types to the remaining methods after my finals (early June). |
I will start working on the additional observation/action spaces this weekend 👍 |
Will the stable baselines 3 repo/package replace the existing stable baselines one, or will all this eventually be merged into the normal stable baselines repo? |
@justinkterry There are no plans to merge/combine the two repositories. Stable-baselines will continue to exist, and continue to receive bug-fixes and the like for some time before it is archived. |
@Miffyli Thank you. Will it remain as "pip3 install stable-baselines," or become something like "pip3 install stable-baselines3"? |
@justinkterry You can already install sb3 with |
Minor point but I wonder if we should rename |
Good point, |
for visualization, probably using something like weights & biases (https://www.wandb.com/) is an option? |
correct me if I'm wrong but W&B does not work offline, no? This is really important as you don't want your results to be published when you do private work. This could be also implemented either as a callback (cf doc) or a new output for the logger. But sounds more like a "contrib" module to me. |
Perhaps an official shorthand for import stable_baselines3 as sb3 |
Is it necessary to continue to provide the interface for vectorized environments inside of this codebase? |
@ManifoldFR Somehow that has eluded my attention. Looks like a good suggestion! Less repeated code is better, as long it fits in stable-baselines functions too. @araffin thoughts (you have most experience doing the eval/wrap functions)? I imagine the hardest part is to update all wrappers that work on vectorized environments. |
I happened onto it by chance because it's not documented anywhere inside of gym's docs, the openai people ported it from their own baselines repo with barely any notification of the change to end users. |
I was aware of this (wrote some comments at that time https://github.com/openai/gym/pull/1513/files#r293899941) but I would argue against for different reasons:
So, in short, I would be in favor only if OpenAI way of maintaining Gym was more reliable. PS: thanks @ManifoldFR for bringing that up ;) |
I see, it figures that OpenAI changing things unilaterally without documentation would be a problem.
I guess ensuring stable-baselines3's code doesn't break when running vectorized envs derived from Gym's implementation would be safer and easier instead...
|
@partiallytyped I think you are already aware but I am also mentioning here. I found a source code example for the original SAC Discrete Implementation paper (the one you found). The author also publicised his code. Hope these help, |
We already have an issue for that #157 |
@araffin |
First release candidate is out: https://github.com/DLR-RM/stable-baselines3/releases/tag/v1.0rc0 |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Add A2C algorithm
As suggested by Antonin Raffin <antonin.raffin@ensta.org>.
* Fix env checker single-step-env edge case Before this change, env checker failed to `reset()` the tested environment before calling `step()` when checking for `Inf` / `NaN`. This could cause environments which happened to have only one `step()` available before the episode was terminated to fail. This is now fixed. * Code review fixes #1 As suggested by Antonin Raffin <antonin.raffin@ensta.org>.
Merged to master for convenience
This issue is meant to be updated as the list of changes is not exhaustive
Dear all,
Stable-Baselines3 beta is now out 🎉 ! This issue is meant to reference what is implemented and what is missing before a first major version.
As mentioned in the README, before v1.0, breaking changes may occur. I would like to encourage contributors (especially the maintainers) to make comments on how to improve the library before v1.0 (and maybe make some internal changes).
I will try to review the features mentioned in hill-a/stable-baselines#576 (and hill-a/stable-baselines#733)
and I will create issues soon to reference what is missing.
What is implemented?
What are the new features?
EvalCallback
)VecEnv
What is missing?
Checklist for v1.0 release
What is next? (for V1.1+)
action_proba
in the base class?side note: should we change the default
start_method
tofork
? (now that we don't have tf anymore)The text was updated successfully, but these errors were encountered: