-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question/Discussion] Comparing stable-baselines3 vs stable-baselines #90
Comments
SB3 is in active development whereas SB2(SB) is in maintenance mode. I use SB3 for my projects since it is more modular and less cluttered than SB2 simply because of dynamic computation graphs, the experience collected by the members in implementing the algorithms, and a well thought out design. |
To add to comment above, some of the methods are as of writing slower (at least without tuning e.g. number of threads), but we are still in process of going over them and optimizing for speed and matching the performance of SB2 implementations. |
Hello, I'm glad that you ask ;) As mentioned by @partiallytyped , SB3 is now the project actively developed by the maintainers.
We have two related issues for that: #49 #48
The main advantage of SB3 is that it was re-built (almost) from scratch, trying not to reproduce the errors made in SB2. If you change the internals, you may expect some changes (they will be documented anyway) until the v1.0 is released (see issue #1 and code review #17 ). It is also in the roadmap to document the differences between SB2 and SB3. Last thing, for SB3 vs other pytorch libraries: #20 |
Hello, I used SB2 for training with SAC and now switched to SB3. SB3 implementation is right now around 2.5X slower than that of the SB2 with almost the same set of (hyper)parameters. Is this something we should expect or something is wrong in my environment and/or code? Many thanks, |
Hi Reza,
There are certainly some low hanging fruits that will result in better performance and some discussion on using torch’s jit. There were some changes to continuous methods (TD3/SAC) so be sure to check those out.
… On 9 Jul 2020, at 16:31, RezaSwe ***@***.***> wrote:
Hello,
I used SB2 for training with SAC and now switched to SB3. SB3 implementation is right now around 2.5X slower than that of the SB2 with almost the same set of (hyper)parameters. Is this something we should expect or something is wrong in my environment and/or code?
Many thanks,
Reza
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hi PartiallyTyped, Thanks for your quick reply! Do you have any idea where is the best to put play with torch.set_num_thread() ? Really appreciate if you can comment on that. BR, |
Before creating the model, or if you are using the rl zoo, you can pass it as an argument to the script (
Is it CPU only? |
Thanks araffin for your reply! Since I am using gym not zoo, I tried to use th.set_num_threads() before creating the model. I got this error message: "MemoryError: Unable to allocate 2.12 GiB for an array with shape (1000000, 1, 568) and data type float32" Does this show that I do not have enough mem available? I tried with different numbers, yet I always got the same error message. |
Gym and rl zoo are two completely different things (cf doc). You can use the rl zoo to train agents on gym environments.
yes, you don't have enough RAM. But this is off-topic. |
thinking about that again, are you sure the network is the same? The default MLP policy of SB3 for SAC is bigger to match original paper. |
Hi arafinn, Thanks for asking. I am changing the default network architecture to get similar nets in SB2 and SB3. Basically, in SB3, I use Best regards, |
I just did a comparison between SB1 and SB3. Same PC, same environment and callback. The only difference is that with SB3 I'm using (finally) my cuda gpu (1050 TI). Well, SB1 without GPU give ~900 fps while SB3 with GPU give ~190. There should be definitely a low hanging fruit someplace. Just wanted to mention sample factory (https://venturebeat.com/2020/06/24/intels-sample-factory-speeds-up-reinforcement-learning-training-on-a-single-pc), I get ~3500 on the same (2-core 6 yo pc) hardware as above. Managed to get a lot more on a multi-core server. |
Yup, SB3 is still semi-unoptimized and first goal is to achieve the same performance as SB2. One quick trick you could try is setting environment variable I'd like to highlight that SB will never achieve same speeds as sample-factory, as that one is specifically designed for high frames-per-second and implements algorithms designed for that (i.e. IMPALA). Stable-baselines focuses on synchronous execution. |
Because sb3 is built using pytorch, there is some expected and unavoidable slowdown simply due to python. We discussed a bit about using pytorch's jit here #57. If you'd like to get your hands dirty, you could compile at least some parts like the replay buffers with numba and jit, but it isn't supported. I also keep avoiding #93 ;) |
Thanks @partiallytyped, just to clarify, SF is also using pytortch. I think @Miffyli is correct. |
I was referring to relative performance between identical/same scope torch and tf implementations. @Miffyli is indeed correct. |
The effect of The first group (around 100 FPS) is with |
what do you call |
In the policy, instead of having a single |
ah ok, please move this discussion to #49 then. |
As mentioned here #122 (comment) EDIT: apparently on cpu only |
On a related note, I migrated from SB2 to SB3 and the training is taking 24 times longer (same custom environment + PPO + default hyperparameters + 100000 time steps + 8 parallel environments)... I did play with the I'm using Pytorch with CUDA support. |
Please read our migration guide (if you did not already): EDIT: I did two quick tests using the zoo (SB2 and SB3) 8 envs and two environments (CartPole-v1, Breakout) and SB3 was ~2x slower on CartPole but 1.2x faster on Breakout |
Thanks for the suggestions. I couldn't reproduce the 24 times slowdown but I prepared a minimal example where the training is taking 4x longer on my custom environment (and 2.6x on CartPole-v1). The instructions are on the readme but let me know if you can't reproduce. |
thanks for setup that up =) widowx_reacher-v1:
n_timesteps: 100000
normalize: true
policy: 'MlpPolicy'
n_envs: 8
n_steps: 128
n_epochs: 4
batch_size: 256
n_timesteps: !!float 1e7
learning_rate: !!float 2.5e-4
clip_range: 0.2
vf_coef: 0.5
ent_coef: 0.01 I would also advise you to deactivate the vf clipping in SB2. Note that SB2 EDIT: @PierreExeter I ran your env with the same hyperparams and got 39s (SB3) vs 39s (SB2), so the same time (cpu only) with 1 thread only |
You're right, it was an issue with the hyperparameters. I also got a training time of 36s when using the SB2 default hyperparameters. |
For latest comparison, please take a look at #122 (comment) |
Did anybody compare the training speed (or other performance metrics) of SB and SB3 for the implemented algorithms (e.g., PPO?)
Is there a reason to prefer either one for developing a new project?
The text was updated successfully, but these errors were encountered: