RL Baselines3 Zoo is a training framework for Reinforcement Learning (RL), using Stable Baselines3.
It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
In addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.
We are looking for contributors to complete the collection!
Goals of this repository:
- Provide a simple interface to train and enjoy RL agents
- Benchmark the different Reinforcement Learning algorithms
- Provide tuned hyperparameters for each environment and RL algorithm
- Have fun with the trained agents!
This is the SB3 version of the original SB2 rl-zoo.
Note: although SB3 and the RL Zoo are compatible with Numpy>=2.0, you will need Numpy<2 to run agents on pybullet envs (see issue).
Documentation is available online: https://rl-baselines3-zoo.readthedocs.io/
From source:
pip install -e .
As a python package:
pip install rl_zoo3
Note: you can do python -m rl_zoo3.train
from any folder and you have access to rl_zoo3
command line interface, for instance, rl_zoo3 train
is equivalent to python train.py
apt-get install swig cmake ffmpeg
pip install -r requirements.txt
pip install -e .[plots,tests]
Please see Stable Baselines3 documentation for alternatives to install stable baselines3.
The hyperparameters for each environment are defined in hyperparameters/algo_name.yml
.
If the environment exists in this file, then you can train an agent using:
python train.py --algo algo_name --env env_id
Evaluate the agent every 10000 steps using 10 episodes for evaluation (using only one evaluation env):
python train.py --algo sac --env HalfCheetahBulletEnv-v0 --eval-freq 10000 --eval-episodes 10 --n-eval-envs 1
More examples are available in the documentation.
The RL Zoo has some integration with other libraries/services like Weights & Biases for experiment tracking or Hugging Face for storing/sharing trained models. You can find out more in the dedicated section of the documentation.
Please see the dedicated section of the documentation.
Note: to download the repo with the trained agents, you must use git clone --recursive https://github.com/DLR-RM/rl-baselines3-zoo
in order to clone the submodule too.
If the trained agent exists, then you can see it in action using:
python enjoy.py --algo algo_name --env env_id
For example, enjoy A2C on Breakout during 5000 timesteps:
python enjoy.py --algo a2c --env BreakoutNoFrameskip-v4 --folder rl-trained-agents/ -n 5000
Please see the dedicated section of the documentation.
Please see the dedicated section of the documentation.
Final performance of the trained agents can be found in benchmark.md
. To compute them, simply run python -m rl_zoo3.benchmark
.
List and videos of trained agents can be found on our Huggingface page: https://huggingface.co/sb3
NOTE: this is not a quantitative benchmark as it corresponds to only one run (cf issue #38). This benchmark is meant to check algorithm (maximal) performance, find potential bugs and also allow users to have access to pretrained agents.
7 atari games from OpenAI benchmark (NoFrameskip-v4 versions).
RL Algo | BeamRider | Breakout | Enduro | Pong | Qbert | Seaquest | SpaceInvaders |
---|---|---|---|---|---|---|---|
A2C | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
PPO | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
DQN | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
QR-DQN | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Additional Atari Games (to be completed):
RL Algo | MsPacman | Asteroids | RoadRunner |
---|---|---|---|
A2C | ✔️ | ✔️ | ✔️ |
PPO | ✔️ | ✔️ | ✔️ |
DQN | ✔️ | ✔️ | ✔️ |
QR-DQN | ✔️ | ✔️ | ✔️ |
RL Algo | CartPole-v1 | MountainCar-v0 | Acrobot-v1 | Pendulum-v1 | MountainCarContinuous-v0 |
---|---|---|---|---|---|
ARS | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
A2C | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
PPO | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
DQN | ✔️ | ✔️ | ✔️ | N/A | N/A |
QR-DQN | ✔️ | ✔️ | ✔️ | N/A | N/A |
DDPG | N/A | N/A | N/A | ✔️ | ✔️ |
SAC | N/A | N/A | N/A | ✔️ | ✔️ |
TD3 | N/A | N/A | N/A | ✔️ | ✔️ |
TQC | N/A | N/A | N/A | ✔️ | ✔️ |
TRPO | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
RL Algo | BipedalWalker-v3 | LunarLander-v2 | LunarLanderContinuous-v2 | BipedalWalkerHardcore-v3 | CarRacing-v0 |
---|---|---|---|---|---|
ARS | ✔️ | ✔️ | |||
A2C | ✔️ | ✔️ | ✔️ | ✔️ | |
PPO | ✔️ | ✔️ | ✔️ | ✔️ | |
DQN | N/A | ✔️ | N/A | N/A | N/A |
QR-DQN | N/A | ✔️ | N/A | N/A | N/A |
DDPG | ✔️ | N/A | ✔️ | ||
SAC | ✔️ | N/A | ✔️ | ✔️ | |
TD3 | ✔️ | N/A | ✔️ | ✔️ | |
TQC | ✔️ | N/A | ✔️ | ✔️ | |
TRPO | ✔️ | ✔️ |
See https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet/gym/pybullet_envs.
Similar to MuJoCo Envs but with a free (MuJoCo 2.1.0+ is now free!) easy to install simulator: pybullet. We are using BulletEnv-v0
version.
Note: those environments are derived from Roboschool and are harder than the Mujoco version (see Pybullet issue)
RL Algo | Walker2D | HalfCheetah | Ant | Reacher | Hopper | Humanoid |
---|---|---|---|---|---|---|
ARS | ||||||
A2C | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
PPO | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
DDPG | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
SAC | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
TD3 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
TQC | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
TRPO | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
PyBullet Envs (Continued)
RL Algo | Minitaur | MinitaurDuck | InvertedDoublePendulum | InvertedPendulumSwingup |
---|---|---|---|---|
A2C | ||||
PPO | ||||
DDPG | ||||
SAC | ||||
TD3 | ||||
TQC |
RL Algo | Walker2d | HalfCheetah | Ant | Swimmer | Hopper | Humanoid |
---|---|---|---|---|---|---|
ARS | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
A2C | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
PPO | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
DDPG | ||||||
SAC | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
TD3 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
TQC | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
TRPO | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
See https://gym.openai.com/envs/#robotics and #71
MuJoCo version: 1.50.1.0 Gym version: 0.18.0
We used the v1 environments.
RL Algo | FetchReach | FetchPickAndPlace | FetchPush | FetchSlide |
---|---|---|---|---|
HER+TQC | ✔️ | ✔️ | ✔️ | ✔️ |
See https://github.com/qgallouedec/panda-gym/.
Similar to MuJoCo Robotics Envs but with a free easy to install simulator: pybullet.
We used the v1 environments.
RL Algo | PandaReach | PandaPickAndPlace | PandaPush | PandaSlide | PandaStack |
---|---|---|---|---|---|
HER+TQC | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
See https://github.com/Farama-Foundation/Minigrid. A simple, lightweight and fast Gym environments implementation of the famous gridworld.
RL Algo | Empty-Random-5x5 | FourRooms | DoorKey-5x5 | MultiRoom-N4-S5 | Fetch-5x5-N2 | GoToDoor-5x5 | PutNear-6x6-N2 | RedBlueDoors-6x6 | LockedRoom | KeyCorridorS3R1 | Unlock | ObstructedMaze-2Dlh |
---|---|---|---|---|---|---|---|---|---|---|---|---|
A2C | ||||||||||||
PPO | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
DQN | ||||||||||||
QR-DQN | ||||||||||||
TRPO |
There are 22 environment groups (variations for each) in total.
You can train agents online using Colab notebook.
The zoo is not meant to be executed from an interactive session (e.g: Jupyter Notebooks, IPython), however, it can be done by modifying sys.argv
and adding the desired arguments.
Example
import sys
from rl_zoo3.train import train
sys.argv = ["python", "--algo", "ppo", "--env", "MountainCar-v0"]
train()
To run tests, first install pytest, then:
make pytest
Same for type checking with pytype:
make type
To cite this repository in publications:
@misc{rl-zoo3,
author = {Raffin, Antonin},
title = {RL Baselines3 Zoo},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/DLR-RM/rl-baselines3-zoo}},
}
If you trained an agent that is not present in the RL Zoo, please submit a Pull Request (containing the hyperparameters and the score too).
We would like to thank our contributors: @iandanforth, @tatsubori @Shade5 @mcres, @ernestum, @qgallouedec