-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V3 new backend: PyTorch? and the future of Stable Baselines #733
Comments
Do you have any particular python version in mind? |
it will be python 3.6+ as many users are relying on that version, even though 3.7+ are more typing features. |
Would it be possible to publish the pytorch based version you mention? Perhaps even privately? In return we could test against the original stable-baselines which we are using for our project. The main reason we'd like to try stable-baselines is because we think trained policy export might be easier using pytorch, and we would urgently like to try this. Happy also in return to write a model export guide for the pytorch based version if that was of interest. |
@araffin will share this once some necessary stuff is done, should not be tooooo long from now (can not give an exact date). Meanwhile, you could take a look at this discussion on exporting models to PyTorch. This should not be too difficult, but you have to be careful with layer differences in TF and PyTorch, as well as some "default-behaviour" included in stable-baselines (like normalization of image inputs). |
Publishing it is planned (the open source process is currently on-going), but I'm not allowed for legal reasons to share it even privately for now. |
Hi! checking to see when it would be possible to kick the V3 pytorch tires? |
@jarlva now? https://github.com/DLR-RM/stable-baselines3 This is a beta version, I will write a roadmap to v1.0 issue soon (I'm waiting for that before making public announcement). |
Awesome Antonin! |
Beta is over: https://github.com/DLR-RM/stable-baselines3/releases |
Version3 is now online: https://github.com/DLR-RM/stable-baselines3
This issue summarizes the discussion between the maintainers (@hill-a , @erniejunior , @AdamGleave , @Miffyli and I) about the next backend and the future of stable baselines.
First, we recommend anyone to read the summary of design choices in #576
Backend Choice
This is the biggest design choice for next major version. In any case, we will drop tensorflow 1 for something else, among the candidate we have: pytorch, tensorflow 2, jax.
Maintainers opinion
The majority of the maintainers would favor PyTorch as they already work with it and the rest don't have strong feelings as they will have to switch to a new framework anyway.
As a transition, here is the final results from the poll I created some weeks ago on twitter:
Number of views: 4500
Votes: 319 (quite a lot!)
Results:
Disclaimer: doing a poll on Twitter restricts the audience but it's a good start
Tensorflow 2
Pros:
Cons:
tf.function
can be trickyJax
Pros:
Cons:
PyTorch
Pros:
Cons:
Side note: although the twitter poll is biased, the gap between first and second choice is striking.
Summary
As a summary, the first choice for the backend would be PyTorch for mainly two reasons:
A second choice would be Jax because:
It seems that tensorflow 2 does not convince much people because it is a completely new framework (compared to tf1, even if it shares the name) but is fairly new and compared to PyTorch. It seems to have the same features but with less maturity.
Future of Stable-Baselines
PyTorch version
I currently have an internal PyTorch version of Stable Baselines, codename "Torchy Baselines" (and its zoo), that I use for my research (RL for robotics). It already has a working version of A2C, PPO, SAC and TD3.
I dropped python 3.5 support in order to use f-strings, more typing and have no issues with dicts. Python 3.5 end of life is coming soon anyway.
We agree with the other maintainers that this will be a good starting point but with some conditions:
Release date
The plan is to release an early version (and its zoo) as soon as possible (in the next two months, so before the end of April).
New name
Because of the big changes and also because it will be released under the DLR-RM team, we will update the name of the library:
Stable-Baselines3 will be its new name (so we keep the Stable Baselines name while having a different package to show the huge internal change)
V2 support
The plan (as soon as the V3 is released) would be to do only bug fixes for v2 for 6 months. We will give more details on that later.
The text was updated successfully, but these errors were encountered: