Release 2.5.0a1 (WIP)

Breaking Changes

Upgraded to Pytorch >= 2.3.0
Upgraded to SB3 >= 2.5.0

New Features

Added support for Numpy v2
Added support for specifying callbacks and env wrapper as python object in python config files (instead of string)

Bug fixes

Documentation

Other

Release 2.4.0 (2024-11-18)

New algorithm: CrossQ, Gymnasium v1.0 support, and better defaults for SAC/TQC on Swimmer-v4 env

Breaking Changes

Updated defaults hyperparameters for TQC/SAC for Swimmer-v4 (decrease gamma for more consistent results) (@JacobHA) W&B report
Upgraded to SB3 >= 2.4.0
Renamed LunarLander-v2 to LunarLander-v3 in hyperparameters

New Features

Added CrossQ hyperparameters for SB3-contrib (@danielpalen)
Added Gymnasium v1.0 support

Bug fixes

Replaced deprecated huggingface_hub.Repository when pushing to Hugging Face Hub by the recommended HfApi (see https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http) (@cochaviz)

Documentation

Other

Updated PyTorch version to 2.4.1 in the CI
Switched to uv to download packages faster on GitHub CI

Release 2.3.0 (2024-03-31)

Breaking Changes

Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
Upgraded to SB3 >= 2.3.0

Other

Added test dependencies to setup.py (@power-edge)
Simplify dependencies of requirements.txt (remove duplicates from setup.py)

Release 2.2.1 (2023-11-17)

Breaking Changes

Removed gym dependency, the package is still required for some pretrained agents.
Upgraded to SB3 >= 2.2.1
Upgraded to Huggingface-SB3 >= 3.0
Upgraded to pytablewriter >= 1.0

New Features

Added --eval-env-kwargs to train.py (@Quentin18)
Added ppo_lstm to hyperparams_opt.py (@technocrat13)

Bug fixes

Upgraded to pybullet_envs_gymnasium>=0.4.0
Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)

Documentation

Other

Updated docker image, removed support for X server
Replaced deprecated optuna.suggest_uniform(...) by optuna.suggest_float(..., low=..., high=...)
Switched to ruff for sorting imports
Updated tests to use shlex.split()
Fixed rl_zoo3/hyperparams_opt.py type hints
Fixed rl_zoo3/exp_manager.py type hints

Release 2.1.0 (2023-08-17)

Breaking Changes

Dropped python 3.7 support
SB3 now requires PyTorch 1.13+
Upgraded to SB3 >= 2.1.0
Upgraded to Huggingface-SB3 >= 2.3
Upgraded to Optuna >= 3.0
Upgraded to cloudpickle >= 2.2.1

New Features

Added python 3.11 support

Bug fixes

Documentation

Other

Release 2.0.0 (2023-06-22)

Gymnasium support

Warning Stable-Baselines3 (SB3) v2.0.0 will be the last one supporting python 3.7

Breaking Changes

Fixed bug in HistoryWrapper, now returns the correct obs space limits
Upgraded to SB3 >= 2.0.0
Upgraded to Huggingface-SB3 >= 2.2.5
Upgraded to Gym API 0.26+, RL Zoo3 doesn't work anymore with Gym 0.21

New Features

Added Gymnasium support
Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper

Bug fixes

Renamed CarRacing-v1 to CarRacing-v2 in hyperparameters
Huggingface push to hub now accepts a --n-timesteps argument to adjust the length of the video
Fixed record_video steps (before it was stepping in a closed env)

Release 1.8.0 (2023-04-07)

New Documentation, Multi-Env HerReplayBuffer

Warning Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend. Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs). You can find a migration guide here. If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.

Breaking Changes

Upgraded to SB3 >= 1.8.0
Upgraded to new HerReplayBuffer implementation that supports multiple envs
Removed TimeFeatureWrapper for Panda and Fetch envs, as the new replay buffer should handle timeout.

New Features

Tuned hyperparameters for RecurrentPPO on Swimmer
Documentation is now built using Sphinx and hosted on read the doc
Added hyperparameters pre-trained agents for PPO on 11 MiniGrid envs

Bug fixes

Set highway-env version to 1.5 and setuptools to v65.5 for the CI
Removed use_auth_token for push to hub util
Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see openai/gym#1304)
Fixed gym-minigrid policy (from MlpPolicy to MultiInputPolicy)

Documentation

Other

Added support for ruff (fast alternative to flake8) in the Makefile
Removed Gitlab CI file
Replaced deprecated optuna.suggest_loguniform(...) by optuna.suggest_float(..., log=True)
Switched to ruff and pyproject.toml
Removed online_sampling and max_episode_length argument when using HerReplayBuffer

Release 1.7.0 (2023-01-10)

SB3 v1.7.0, added support for python config files

Breaking Changes

--yaml-file argument was renamed to -conf (--conf-file) as now python file are supported too
Upgraded to SB3 >= 1.7.0 (changed net_arch=[dict(pi=.., vf=..)] to net_arch=dict(pi=.., vf=..))

New Features

Specifying custom policies in yaml file is now supported (@Rick-v-E)
Added monitor_kwargs parameter
Handle the env_kwargs of render:True under the hood for panda-gym v1 envs in enjoy replay to match visualzation behavior of other envs
Added support for python config file
Tuned hyperparameters for PPO on Swimmer
Added -tags/--wandb-tags argument to train.py to add tags to the wandb run
Added a sb3 version tag to the wandb run

Bug fixes

Allow python -m rl_zoo3.cli to be called directly
Fixed a bug where custom environments were not found despite passing --gym-package when using subprocesses
Fixed TRPO hyperparameters for MinitaurBulletEnv-v0, MinitaurBulletDuckEnv-v0, HumanoidBulletEnv-v0, InvertedDoublePendulumBulletEnv-v0 and InvertedPendulumSwingupBulletEnv

Documentation

Other

scripts/plot_train.py plots models such that newer models appear on top of older ones.
Added additional type checking using mypy
Standardized the use of from gym import spaces

Release 1.6.3 (2022-10-13)

Breaking Changes

New Features

Bug fixes

python3 -m rl_zoo3.train now works as expected

Documentation

Added instructions and examples on passing arguments in an interactive session (@richter43)

Other

Used issue forms instead of issue templates

Release 1.6.2.post2 (2022-10-10)

Breaking Changes

RL Zoo is now a python package
low pass filter was removed
Upgraded to Stable-Baselines3 (SB3) >= 1.6.2
Upgraded to sb3-contrib >= 1.6.2
Use now built-in SB3 ProgressBarCallback instead of TQDMCallback

New Features

RL Zoo cli: rl_zoo3 train and rl_zoo3 enjoy

Bug fixes

Documentation

Other

Release 1.6.1 (2022-09-30)

Progress bar and custom yaml file

Breaking Changes

Upgraded to Stable-Baselines3 (SB3) >= 1.6.1
Upgraded to sb3-contrib >= 1.6.1

New Features

Added --yaml-file argument option for train.py to read hyperparameters from custom yaml files (@JohannesUl)

Bug fixes

Added custom_object parameter on record_video.py (@Affonso-Gui)
Changed optimize_memory_usage to False for DQN/QR-DQN on record_video.py (@Affonso-Gui)
In ExperimentManager _maybe_normalize set training to False for eval envs, to prevent normalization stats from being updated in eval envs (e.g. in EvalCallback) (@pchalasani).
Only one env is used to get the action space while optimizing hyperparameters and it is correctly closed (@SammyRamone)
Added progress bar via the -P argument using tqdm and rich

Documentation

Other

Release 1.6.0 (2022-08-05)

RecurrentPPO (ppo_lstm) and Huggingface integration

Breaking Changes

Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
Updated default --eval-freq from 10k to 25k steps
Update default horizon to 2 for the HistoryWrapper
Upgrade to Stable-Baselines3 (SB3) >= 1.6.0
Upgrade to sb3-contrib >= 1.6.0

New Features

Support setting PyTorch's device with thye --device flag (@gregwar)
Add --max-total-trials parameter to help with distributed optimization. (@ernestum)
Added vec_env_wrapper support in the config (works the same as env_wrapper)
Added Huggingface hub integration
Added RecurrentPPO support (aka ppo_lstm)
Added autodownload for "official" sb3 models from the hub
Added Humanoid-v3, Ant-v3, Walker2d-v3 models for A2C (@pseudo-rnd-thoughts)
Added MsPacman models

Bug fixes

Fix Reacher-v3 name in PPO hyperparameter file
Pinned ale-py==0.7.4 until new SB3 version is released
Fix enjoy / record videos with LSTM policy
Fix bug with environments that have a slash in their name (@ernestum)
Changed optimize_memory_usage to False for DQN/QR-DQN on Atari games, if you want to save RAM, you need to deactivate handle_timeout_termination in the replay_buffer_kwargs

Documentation

Other

When pruner is set to "none", use NopPruner instead of diverted MedianPruner (@qgallouedec)

Release 1.5.0 (2022-03-25)

Support for Weight and Biases experiment tracking

Breaking Changes

Upgrade to Stable-Baselines3 (SB3) >= 1.5.0
Upgrade to sb3-contrib >= 1.5.0
Upgraded to gym 0.21

New Features

Verbose mode for each trial (when doing hyperparam optimization) can now be activated using the debug mode (verbose == 2)
Support experiment tracking via Weights and Biases via the --track flag (@vwxyzjn)
Support tracking raw episodic stats via RawStatisticsCallback (@vwxyzjn, see #216)

Bug fixes

Policies saved during during optimization with distributed Optuna load on new systems (@jkterry)
Fixed script for recording video that was not up to date with the enjoy script

Documentation

Other

Release 1.4.0 (2022-01-19)

Breaking Changes

Dropped python 3.6 support
Upgrade to Stable-Baselines3 (SB3) >= 1.4.0
Upgrade to sb3-contrib >= 1.4.0

New Features

Added mujoco hyperparameters
Added MuJoCo pre-trained agents
Added script to parse best hyperparameters of an optuna study
Added TRPO support
Added ARS support and pre-trained agents

Bug fixes

Documentation

Replace front image

Other

Release 1.3.0 (2021-10-23)

rliable plots and bug fixes

WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021). We highly recommended you to upgrade to Python >= 3.7.

Breaking Changes

Upgrade to panda-gym 1.1.1
Upgrade to Stable-Baselines3 (SB3) >= 1.3.0
Upgrade to sb3-contrib >= 1.3.0

New Features

Added support for using rliable for performance comparison

Bug fixes

Fix training with Dict obs and channel last images

Documentation

Other

Updated docker image
constrained gym version: gym>=0.17,<0.20
Better hyperparameters for A2C/PPO on Pendulum

Release 1.2.0 (2021-09-08)

Breaking Changes

Upgrade to Stable-Baselines3 (SB3) >= 1.2.0
Upgrade to sb3-contrib >= 1.2.0

New Features

Added support for Python 3.10

Bug fixes

Fix --load-last-checkpoint (@SammyRamone)
Fix TypeError for gym.Env class entry points in ExperimentManager (@schuderer)
Fix usage of callbacks during hyperparameter optimization (@SammyRamone)

Documentation

Other

Added python 3.9 to Github CI
Increased DQN replay buffer size for Atari games (@nikhilrayaprolu)

Release 1.1.0 (2021-07-01)

Breaking Changes

Upgrade to Stable-Baselines3 (SB3) >= 1.1.0
Upgrade to sb3-contrib >= 1.1.0
Add timeout handling (cf SB3 doc)
HER is now a replay buffer class and no more an algorithm
Removed PlotNoiseRatioCallback
Removed PlotActionWrapper
Changed 'lr' key in Optuna param dict to 'learning_rate' so the dict can be directly passed to SB3 methods (@jkterry)

New Features

Add support for recording videos of best models and checkpoints (@mcres)
Add support for recording videos of training experiments (@mcres)
Add support for dictionary observations
Added experimental parallel training (with utils.callbacks.ParallelTrainCallback)
Added support for using multiple envs for evaluation
Added --load-last-checkpoint option for the enjoy script
Save Optuna study object at the end of hyperparameter optimization and plot the results (plotly package required)
Allow to pass multiple folders to scripts/plot_train.py
Flag to save logs and optimal policies from each training run (@jkterry)

Bug fixes

Fixed video rendering for PyBullet envs on Linux
Fixed get_latest_run_id() so it works in Windows too (@NicolasHaeffner)
Fixed video record when using HER replay buffer

Documentation

Updated README (dict obs are now supported)

Other

Added is_bullet() to ExperimentManager
Simplify close() for the enjoy script
Updated docker image to include latest black version
Updated TD3 Walker2D model (thanks @modanesh)
Fixed typo in plot title (@scottemmons)
Minimum cloudpickle version added to requirements.txt (@amy12xx)
Fixed atari-py version (ROM missing in newest release)
Updated SAC and TD3 search spaces
Cleanup eval_freq documentation and variable name changes (@jkterry)
Add clarifying print statement when printing saved hyperparameters during optimization (@jkterry)
Clarify n_evaluations help text (@jkterry)
Simplified hyperparameters files making use of defaults
Added new TQC+HER agents
Add panda-gym environments (@qgallouedec)

Release 1.0 (2021-03-17)

Breaking Changes

Upgrade to SB3 >= 1.0
Upgrade to sb3-contrib >= 1.0

New Features

Added 100+ trained agents + benchmark file
Add support for loading saved model under python 3.8+ (no retraining possible)
Added Robotics pre-trained agents (@sgillen)

Bug fixes

Bug fixes for HER handling action noise
Fixed double reset bug with HER and enjoy script

Documentation

Added doc about plotting scripts

Other

Updated HER hyperparameters

Pre-Release 0.11.1 (2021-02-27)

Breaking Changes

Removed LinearNormalActionNoise
Evaluation is now deterministic by default, except for Atari games
sb3_contrib is now required
TimeFeatureWrapper was moved to the contrib repo
Replaced old plot_train.py script with updated plot_training_success.py
Renamed n_episodes_rollout to train_freq tuple to match latest version of SB3

New Features

Added option to choose which VecEnv class to use for multiprocessing
Added hyperparameter optimization support for TQC
Added support for QR-DQN from SB3 contrib

Bug fixes

Improved detection of Atari games
Fix potential bug in plotting script when there is not enough timesteps
Fixed a bug when using HER + DQN/TQC for hyperparam optimization

Documentation

Improved documentation (@cboettig)

Other

Refactored train script, now uses a ExperimentManager class
Replaced make_env with SB3 built-in make_vec_env
Add more type hints (utils/utils.py done)
Use f-strings when possible
Changed PPO atari hyperparameters (removed vf clipping)
Changed A2C atari hyperparameters (eps value of the optimizer)
Updated benchmark script
Updated hyperparameter optim search space (commented gSDE for A2C/PPO)
Updated DQN hyperparameters for CartPole
Do not wrap channel-first image env (now natively supported by SB3)
Removed hack to log success rate
Simplify plot script

Pre-Release 0.10.0 (2020-10-28)

Breaking Changes

New Features

Added support for HER
Added low-pass filter wrappers in utils/wrappers.py
Added TQC support, implementation from sb3-contrib

Bug fixes

Fixed TimeFeatureWrapper inferring max timesteps
Fixed flatten_dict_observations in utils/utils.py for recent Gym versions (@ManifoldFR)
VecNormalize now takes gamma hyperparameter into account
Fix loading of VecNormalize when continuing training or using trained agent

Documentation

Other

Added tests for the wrappers
Updated plotting script

Release 0.8.0 (2020-08-04)

Breaking Changes

New Features

Distributed optimization (@SammyRamone)
Added --load-checkpoints to load particular checkpoints
Added --num-threads to enjoy script
Added DQN support
Added saving of command line args (@SammyRamone)
Added DDPG support
Added version
Added RMSpropTFLike support

Bug fixes

Fixed optuna warning (@SammyRamone)
Fixed --save-freq which was not taking parallel env into account
Set buffer_size to 1 when testing an Off-Policy model (e.g. SAC/DQN) to avoid memory allocation issue
Fixed seed at load time for enjoy.py
Non-deterministic eval when doing hyperparameter optimization on atari games
Use 'maximize' for hyperparameter optimization (@SammyRamone)
Fixed a bug where reward where not normalized when doing hyperparameter optimization (@caburu)
Removed nminibatches from ppo.yml for MountainCar-v0 and Acrobot-v1. (@blurLake)
Fixed --save-replay-buffer to be compatible with latest SB3 version
Close environment at the end of training
Updated DQN hyperparameters on simpler gym env (due to an update in the implementation)

Documentation

Other

Reformat enjoy.py, test_enjoy.py, test_hyperparams_opt.py, test_train.py, train.py, callbacks.py, hyperparams_opt.py, utils.py, wrappers.py (@salmannotkhan)
Reformat record_video.py (@salmannotkhan)
Added codestyle check make lint using flake8
Reformat benchmark.py (@salmannotkhan)
Added github ci
Fixes most linter warnings
Now using black and isort for auto-formatting
Updated plots

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Release 2.5.0a1 (WIP)

Breaking Changes

New Features

Bug fixes

Documentation

Other

Release 2.4.0 (2024-11-18)

Breaking Changes

New Features

Bug fixes

Documentation

Other

Release 2.3.0 (2024-03-31)

Breaking Changes

Other

Release 2.2.1 (2023-11-17)

Breaking Changes

New Features

Bug fixes

Documentation

Other

Release 2.1.0 (2023-08-17)

Breaking Changes

New Features

Bug fixes

Documentation

Other

Release 2.0.0 (2023-06-22)

Breaking Changes

New Features

Bug fixes

Release 1.8.0 (2023-04-07)

Breaking Changes

New Features

Bug fixes

Documentation

Other

Release 1.7.0 (2023-01-10)

Breaking Changes

New Features

Bug fixes

Documentation

Other

Release 1.6.3 (2022-10-13)

Breaking Changes

New Features

Bug fixes

Documentation

Other

Release 1.6.2.post2 (2022-10-10)

Breaking Changes

New Features

Bug fixes

Documentation

Other

Release 1.6.1 (2022-09-30)

Breaking Changes

New Features

Bug fixes

Documentation

Other

Release 1.6.0 (2022-08-05)

Breaking Changes

New Features

Bug fixes

Documentation

Other

Release 1.5.0 (2022-03-25)

Breaking Changes

New Features

Bug fixes

Documentation

Other

Release 1.4.0 (2022-01-19)