[Question/Discussion] Comparing stable-baselines3 vs stable-baselines #90

AlessandroZavoli · 2020-07-07T20:32:49Z

Did anybody compare the training speed (or other performance metrics) of SB and SB3 for the implemented algorithms (e.g., PPO?)
Is there a reason to prefer either one for developing a new project?

m-rph · 2020-07-07T20:38:56Z

SB3 is in active development whereas SB2(SB) is in maintenance mode. I use SB3 for my projects since it is more modular and less cluttered than SB2 simply because of dynamic computation graphs, the experience collected by the members in implementing the algorithms, and a well thought out design.

Miffyli · 2020-07-07T21:26:40Z

To add to comment above, some of the methods are as of writing slower (at least without tuning e.g. number of threads), but we are still in process of going over them and optimizing for speed and matching the performance of SB2 implementations.

araffin · 2020-07-07T21:28:03Z

Hello,

I'm glad that you ask ;)

As mentioned by @partiallytyped , SB3 is now the project actively developed by the maintainers.
It does not have all the features of SB2 (yet) but is already ready for most use cases.

Did anybody compare the training speed (or other performance metrics) of SB and SB3 for the implemented algorithms (e.g., PPO?)

We have two related issues for that: #49 #48
The algorithms have been benchmarked recently in a paper for the continuous case and I have already successfully used SAC on real robots.
Because PyTorch uses dynamic graph, you have to expect a small slow down (we plan to use the jit improve the speed in the future #57 ) and you may have to play with torch.set_num_thread() to have the best speed. One exception is DQN which is significantly faster in SB3 because of the new replay buffer implementation.

Is there a reason to prefer either one for developing a new project?

The main advantage of SB3 is that it was re-built (almost) from scratch, trying not to reproduce the errors made in SB2.
That means much clearer code, more test coverage and higher quality standard (with the use of typing notably).
Unless you need to use RNN, I would highly recommend you to use SB3.

If you change the internals, you may expect some changes (they will be documented anyway) until the v1.0 is released (see issue #1 and code review #17 ).
If you use only the "user api" (without changing the internals), then not much should change and I would highly recommend you to use the rl zoo that should cover most needs (and that is up to date with the best practices for using SB3).

It is also in the roadmap to document the differences between SB2 and SB3.

Last thing, for SB3 vs other pytorch libraries: #20

RezaSwe · 2020-07-09T13:30:52Z

Hello,

I used SB2 for training with SAC and now switched to SB3. SB3 implementation is right now around 2.5X slower than that of the SB2 with almost the same set of (hyper)parameters. Is this something we should expect or something is wrong in my environment and/or code?

Many thanks,
Reza

m-rph · 2020-07-09T13:47:59Z

Hi Reza, There are certainly some low hanging fruits that will result in better performance and some discussion on using torch’s jit. There were some changes to continuous methods (TD3/SAC) so be sure to check those out.

…

On 9 Jul 2020, at 16:31, RezaSwe ***@***.***> wrote: Hello, I used SB2 for training with SAC and now switched to SB3. SB3 implementation is right now around 2.5X slower than that of the SB2 with almost the same set of (hyper)parameters. Is this something we should expect or something is wrong in my environment and/or code? Many thanks, Reza — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

RezaSwe · 2020-07-09T14:00:50Z

Hi PartiallyTyped,

Thanks for your quick reply!

Do you have any idea where is the best to put play with torch.set_num_thread() ? Really appreciate if you can comment on that.

BR,
Reza

araffin · 2020-07-09T14:12:27Z

Do you have any idea where is the best to put play with torch.set_num_thread() ? Really appreciate if you can comment on that.

Before creating the model, or if you are using the rl zoo, you can pass it as an argument to the script (--num-threads).

I used SB2 for training with SAC and now switched to SB3. SB3 implementation is right now around 2.5X slower than that of the SB2 with almost the same set of (hyper)parameters. Is this something we should expect or something is wrong in my environment and/or code?

Is it CPU only?
If so, you should play a bit with th.set_num_threads()

RezaSwe · 2020-07-09T15:18:31Z

Thanks araffin for your reply!

Since I am using gym not zoo, I tried to use th.set_num_threads() before creating the model. I got this error message:

"MemoryError: Unable to allocate 2.12 GiB for an array with shape (1000000, 1, 568) and data type float32"

Does this show that I do not have enough mem available? I tried with different numbers, yet I always got the same error message.

araffin · 2020-07-10T08:42:32Z

Since I am using gym not zoo

Gym and rl zoo are two completely different things (cf doc). You can use the rl zoo to train agents on gym environments.

Does this show that I do not have enough mem available?

yes, you don't have enough RAM. But this is off-topic.

araffin · 2020-07-11T07:07:30Z

Hello,

I used SB2 for training with SAC and now switched to SB3. SB3 implementation is right now around 2.5X slower than that of the SB2 with almost the same set of (hyper)parameters. Is this something we should expect or something is wrong in my environment and/or code?

Many thanks,
Reza

thinking about that again, are you sure the network is the same? The default MLP policy of SB3 for SAC is bigger to match original paper.
All those differences will be documented in the near future (see roadmap #1)

RezaSwe · 2020-07-11T07:45:32Z

Hi arafinn,

Thanks for asking.

I am changing the default network architecture to get similar nets in SB2 and SB3. Basically, in SB3, I use
net_arch=[700, 700, 250]
and in SB2 I use
layers=[700, 700, 250].
Does this lead to the same net as I am assuming now?

Best regards,
Reza

jarlva · 2020-07-15T13:57:11Z

I just did a comparison between SB1 and SB3. Same PC, same environment and callback. The only difference is that with SB3 I'm using (finally) my cuda gpu (1050 TI). Well, SB1 without GPU give ~900 fps while SB3 with GPU give ~190. There should be definitely a low hanging fruit someplace.

Just wanted to mention sample factory (https://venturebeat.com/2020/06/24/intels-sample-factory-speeds-up-reinforcement-learning-training-on-a-single-pc), I get ~3500 on the same (2-core 6 yo pc) hardware as above. Managed to get a lot more on a multi-core server.

Miffyli · 2020-07-15T14:10:33Z

@jarlva

Yup, SB3 is still semi-unoptimized and first goal is to achieve the same performance as SB2. One quick trick you could try is setting environment variable OMP_NUM_THREADS=1 (or same via pytorch), which in some cases drastically increases the speed.

I'd like to highlight that SB will never achieve same speeds as sample-factory, as that one is specifically designed for high frames-per-second and implements algorithms designed for that (i.e. IMPALA). Stable-baselines focuses on synchronous execution.

m-rph · 2020-07-15T14:21:24Z

@jarlva

Because sb3 is built using pytorch, there is some expected and unavoidable slowdown simply due to python. We discussed a bit about using pytorch's jit here #57.

If you'd like to get your hands dirty, you could compile at least some parts like the replay buffers with numba and jit, but it isn't supported.

I also keep avoiding #93 ;)

jarlva · 2020-07-15T14:41:44Z

Thanks @partiallytyped, just to clarify, SF is also using pytortch. I think @Miffyli is correct.
Thanks again for everyone's response!

m-rph · 2020-07-15T14:46:03Z

I was referring to relative performance between identical/same scope torch and tf implementations. @Miffyli is indeed correct.

araffin · 2020-07-16T19:00:50Z

The effect of th.set_num_threads() and #106 on a simple example (SAC on Pendulum-v0 with a small network) on cpu only:

The first group (around 100 FPS) is with num_threads=2 and the second one (around 50FPS) is the default (I have 8 cores).
There is 2x boost.
And each time, the run with #106 is 10% faster, except when num_thread=1 (not shown here)

m-rph · 2020-07-17T07:12:46Z

Relevant, I am getting some rather weird performance from DQN, it seems to reach 0 fps (it was with num_threads=1, and old polyak update). When using an ensemble of 10 estimators I got much better performance and I can't pinpoint the issue.

araffin · 2020-07-17T07:17:14Z

what do you call n_estimators ?

m-rph · 2020-07-17T07:19:30Z

In the policy, instead of having a single Qnetwork, I have n_estimator identical QNetworks and their estimation is averaged.
Note, this was running on GPU and the environment was LunarLander.

araffin · 2020-07-17T07:30:47Z

ah ok, please move this discussion to #49 then.

araffin · 2020-07-30T16:59:30Z

As mentioned here #122 (comment)
you should consider upgrading pytorch ;)
There was a huge gain (20% faster) in the latest release. The gap is filled when setting the number of threads manually.

EDIT: apparently on cpu only

PierreExeter · 2020-12-09T16:38:31Z

On a related note, I migrated from SB2 to SB3 and the training is taking 24 times longer (same custom environment + PPO + default hyperparameters + 100000 time steps + 8 parallel environments)... I did play with the --num-threads argument in the train.py script from the RL Zoo and I found the most efficient number to be 6 but it only reduced the training time by 3%.
Any suggestions would be welcome, otherwise I might just switch back to SB2 until I find a better solution.

I'm using Pytorch with CUDA support.

araffin · 2020-12-09T16:53:32Z

Please read our migration guide (if you did not already):
the default hyperparameters are not the same (tuned for Atari in SB2 vs tuned for continuous actions in SB3)
i'm surprised by the slowdown... i would appreciate if you could provide a minimal example to reproduce.

EDIT: I did two quick tests using the zoo (SB2 and SB3) 8 envs and two environments (CartPole-v1, Breakout) and SB3 was ~2x slower on CartPole but 1.2x faster on Breakout
this was cpu only

PierreExeter · 2020-12-10T09:22:30Z

Thanks for the suggestions. I couldn't reproduce the 24 times slowdown but I prepared a minimal example where the training is taking 4x longer on my custom environment (and 2.6x on CartPole-v1). The instructions are on the readme but let me know if you can't reproduce.
This is not too bad of a slowdown, I must have done something wrong previously.

araffin · 2020-12-10T13:37:20Z

I couldn't reproduce the 24 times slowdown but I prepared a minimal example where the training is taking 4x longer on my custom environment (and 2.6x on CartPole-v1)

thanks for setup that up =)
After a quick check, it seems that you are using the default hyperparameters that are different from PPO2 to SB3 PPO (cf migration guide https://stable-baselines3.readthedocs.io/en/master/guide/migration.html#ppo).
If you want to have the same hyperparameters in SB3, you would to do:

widowx_reacher-v1:
  n_timesteps: 100000
  normalize: true
  policy: 'MlpPolicy'
  n_envs: 8
  n_steps: 128
  n_epochs: 4
  batch_size: 256
  n_timesteps: !!float 1e7
  learning_rate: !!float 2.5e-4
  clip_range: 0.2
  vf_coef: 0.5
  ent_coef: 0.01

I would also advise you to deactivate the vf clipping in SB2.

Note that SB2 n_minibatches lead to a batch size that depends on the number of envs which is not the case anymore.

EDIT: @PierreExeter I ran your env with the same hyperparams and got 39s (SB3) vs 39s (SB2), so the same time (cpu only) with 1 thread only

PierreExeter · 2020-12-11T09:02:54Z

You're right, it was an issue with the hyperparameters. I also got a training time of 36s when using the SB2 default hyperparameters.
I optimised the hyperparameters with Optuna and this gave me a training time of 18 minutes... I didn't realise that the hyperparameters could have such a strong effect on the training time. Thanks a lot for your useful inputs.

araffin · 2022-03-11T12:13:55Z

For latest comparison, please take a look at #122 (comment)

araffin added the question Further information is requested label Jul 7, 2020

This was referenced Jul 24, 2020

SubprocVecEnv performance compared to gym.vector.async_vector_env #121

Closed

SAC implementation is 2x slower than in stable-baselines #122

Closed

nbro mentioned this issue Nov 16, 2020

Why does SB3's DQN fails on a custom environment but SB2's DQN does not? #223

Closed

matwilso mentioned this issue Apr 24, 2021

increase default ppo batch size from 64 to 256 #405

Closed

atapley mentioned this issue Mar 2, 2022

Different evaluation results when loading model #795

Closed

4 tasks

araffin closed this as completed Mar 11, 2022

araffin mentioned this issue Oct 19, 2022

SB2 vs SB3 - Performance difference #1124

Closed

4 tasks

atapley mentioned this issue Dec 7, 2022

Tensorboard files not saving when using SubprocVecEnv #1205

Closed

5 tasks

araffin mentioned this issue Aug 25, 2023

Speed up when using MaskablePPO Stable-Baselines-Team/stable-baselines3-contrib#205

Open

4 tasks

araffin mentioned this issue Nov 10, 2023

[Question] Help with understanding PPO hyperparameters (SB2 vs SB3) #1746

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question/Discussion] Comparing stable-baselines3 vs stable-baselines #90

[Question/Discussion] Comparing stable-baselines3 vs stable-baselines #90

AlessandroZavoli commented Jul 7, 2020

m-rph commented Jul 7, 2020

Miffyli commented Jul 7, 2020

araffin commented Jul 7, 2020 •

edited

Loading

RezaSwe commented Jul 9, 2020

m-rph commented Jul 9, 2020 via email

RezaSwe commented Jul 9, 2020

araffin commented Jul 9, 2020

RezaSwe commented Jul 9, 2020

araffin commented Jul 10, 2020

araffin commented Jul 11, 2020

RezaSwe commented Jul 11, 2020

jarlva commented Jul 15, 2020

Miffyli commented Jul 15, 2020

m-rph commented Jul 15, 2020 •

edited

Loading

jarlva commented Jul 15, 2020

m-rph commented Jul 15, 2020

araffin commented Jul 16, 2020

m-rph commented Jul 17, 2020

araffin commented Jul 17, 2020

m-rph commented Jul 17, 2020 •

edited

Loading

araffin commented Jul 17, 2020

araffin commented Jul 30, 2020 •

edited

Loading

PierreExeter commented Dec 9, 2020 •

edited

Loading

araffin commented Dec 9, 2020 •

edited

Loading

PierreExeter commented Dec 10, 2020

araffin commented Dec 10, 2020 •

edited

Loading

PierreExeter commented Dec 11, 2020

araffin commented Mar 11, 2022

[Question/Discussion] Comparing stable-baselines3 vs stable-baselines #90

[Question/Discussion] Comparing stable-baselines3 vs stable-baselines #90

Comments

AlessandroZavoli commented Jul 7, 2020

m-rph commented Jul 7, 2020

Miffyli commented Jul 7, 2020

araffin commented Jul 7, 2020 • edited Loading

RezaSwe commented Jul 9, 2020

m-rph commented Jul 9, 2020 via email

RezaSwe commented Jul 9, 2020

araffin commented Jul 9, 2020

RezaSwe commented Jul 9, 2020

araffin commented Jul 10, 2020

araffin commented Jul 11, 2020

RezaSwe commented Jul 11, 2020

jarlva commented Jul 15, 2020

Miffyli commented Jul 15, 2020

m-rph commented Jul 15, 2020 • edited Loading

jarlva commented Jul 15, 2020

m-rph commented Jul 15, 2020

araffin commented Jul 16, 2020

m-rph commented Jul 17, 2020

araffin commented Jul 17, 2020

m-rph commented Jul 17, 2020 • edited Loading

araffin commented Jul 17, 2020

araffin commented Jul 30, 2020 • edited Loading

PierreExeter commented Dec 9, 2020 • edited Loading

araffin commented Dec 9, 2020 • edited Loading

PierreExeter commented Dec 10, 2020

araffin commented Dec 10, 2020 • edited Loading

PierreExeter commented Dec 11, 2020

araffin commented Mar 11, 2022

araffin commented Jul 7, 2020 •

edited

Loading

m-rph commented Jul 15, 2020 •

edited

Loading

m-rph commented Jul 17, 2020 •

edited

Loading

araffin commented Jul 30, 2020 •

edited

Loading

PierreExeter commented Dec 9, 2020 •

edited

Loading

araffin commented Dec 9, 2020 •

edited

Loading

araffin commented Dec 10, 2020 •

edited

Loading