For the IROS 2022 Safe Robot Learning Competition, check out branch
beta-iros-competition
Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-based control, and model-free and model-based reinforcement learning (RL).
These environments include (and evaluate) symbolic safety constraints and implement input, parameter, and dynamics disturbances to test the robustness and generalizability of control approaches. [PDF]
@article{brunke2021safe,
title={Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning},
author={Lukas Brunke and Melissa Greeff and Adam W. Hall and Zhaocong Yuan and Siqi Zhou and Jacopo Panerati and Angela P. Schoellig},
journal = {Annual Review of Control, Robotics, and Autonomous Systems},
year={2021},
url = {https://arxiv.org/abs/2108.06266}}
git clone https://github.com/utiasDSL/safe-control-gym.git
cd safe-control-gym
Create and access a Python 3.8 environment using
conda
conda create -n safe python=3.8.10
conda activate safe
Install the safe-control-gym
repository
pip install --upgrade pip
pip install -e .
Create and access a Python 3.8 virtual environment using
pyenv
and
venv
pyenv install 3.8.10
pyenv local 3.8.10
python3 -m venv safe
source safe/bin/activate
pip install --upgrade pip
pip install poetry
poetry install
You may need to separately install gmp
, a dependency of pycddlib
:
conda install -c anaconda gmp
or
sudo apt-get install libgmp-dev
See this notebook where safe-control-gym
is pre-installed
Overview of safe-control-gym
's API:
@misc{yuan2021safecontrolgym,
title={safe-control-gym: a Unified Benchmark Suite for Safe Learning-based Control and Reinforcement Learning},
author={Zhaocong Yuan and Adam W. Hall and Siqi Zhou and Lukas Brunke and Melissa Greeff and Jacopo Panerati and Angela P. Schoellig},
year={2021},
eprint={2109.06325},
archivePrefix={arXiv},
primaryClass={cs.RO}}
We compare the sample efficiency of safe-control-gym
with the original [OpenAI Cartpole][1] and [PyBullet Gym's Inverted Pendulum][2], as well as [gym-pybullet-drones
][3].
We choose the default physic simulation integration step of each project.
We report performance results for open-loop, random action inputs.
Note that the Bullet engine frequency reported for safe-control-gym
is typically much finer grained for improved fidelity.
safe-control-gym
quadrotor environment is not as light-weight as [gym-pybullet-drones
][3] but provides the same order of magnitude speed-up and several more safety features/symbolic models.
Environment | GUI | Control Freq. | PyBullet Freq. | Constraints & Disturbances^ | Speed-Up^^ |
---|---|---|---|---|---|
Gym cartpole | True | 50Hz | N/A | No | 1.16x |
InvPenPyBulletEnv | False | 60Hz | 60Hz | No | 158.29x |
cartpole | True | 50Hz | 50Hz | No | 0.85x |
cartpole | False | 50Hz | 1000Hz | No | 24.73x |
cartpole | False | 50Hz | 1000Hz | Yes | 22.39x |
gym-pyb-drones | True | 48Hz | 240Hz | No | 2.43x |
gym-pyb-drones | False | 50Hz | 1000Hz | No | 21.50x |
quadrotor | True | 60Hz | 240Hz | No | 0.74x |
quadrotor | False | 50Hz | 1000Hz | No | 9.28x |
quadrotor | False | 50Hz | 1000Hz | Yes | 7.62x |
^ Whether the environment includes a default set of constraints and disturbances
^^ Speed-up = Elapsed Simulation Time / Elapsed Wall Clock Time; on a 2.30GHz Quad-Core i7-1068NG7 with 32GB 3733MHz LPDDR4X; no GPU
Familiarize with APIs and environments with the scripts in examples/
$ cd ./examples/ # Navigate to the examples folder
$ python3 tracking.py --overrides ./tracking.yaml # PID trajectory tracking with the 2D quadcopter
$ python3 verbose_api.py --task cartpole --overrides verbose_api.yaml # Printout of the extended safe-control-gym APIs
Re-create the Results in "Safe Learning in Robotics" [arXiv link]
To stay in touch, get involved or ask questions, please open an issue on GitHub or contact us via e-mail ({jacopo.panerati, zhaocong.yuan, adam.hall, siqi.zhou, lukas.brunke, melissa.greeff}@robotics.utias.utoronto.ca
).
Figure 6—Robust GP-MPC [1]
$ cd ../experiments/annual_reviews/figure6/ # Navigate to the experiment folder
$ chmod +x create_fig6.sh # Make the script executable, if needed
$ ./create_fig6.sh # Run the script (ca. 2')
This will use the models in safe-control-gym/experiments/figure6/trained_gp_model/
to generate
To also re-train the GP models from scratch (ca. 30' on a laptop)
$ chmod +x create_trained_gp_model.sh # Make the script executable, if needed
$ ./create_trained_gp_model.sh # Run the script (ca. 30')
Note: this will backup and overwrite
safe-control-gym/experiments/figure6/trained_gp_model/
Figure 7—Safe RL Exploration [2]
$ cd ../figure7/ # Navigate to the experiment folder
$ chmod +x create_fig7.sh # Make the script executable, if needed
$ ./create_fig7.sh # Run the script (ca. 5'')
This will use the data in safe-control-gym/experiments/figure7/safe_exp_results.zip/
to generate
To also re-train all the controllers/agents (warning: >24hrs on a laptop, if necessary, run each one of the loops in the Bash script—PPO, PPO with reward shaping, and the Safe Explorer—separately)
$ chmod +x create_safe_exp_results.sh # Make the script executable, if needed
$ ./create_safe_exp_results.sh # Run the script (>24hrs)
Note: this script will (over)write the results in
safe-control-gym/experiments/figure7/safe_exp_results/
; if you do not run the re-training to completion, delete the partial resultsrm -r -f ./safe_exp_results/
before running./create_fig7.sh
again.
Figure 8—Model Predictive Safety Certification [3]
(required) Obtain MOSEK's license (free for academia).
Once you have received (via e-mail) and downloaded the license to your own ~/Downloads
folder, install it by executing
$ mkdir ~/mosek # Create MOSEK license folder in your home '~'
$ mv ~/Downloads/mosek.lic ~/mosek/ # Copy the downloaded MOSEK license to '~/mosek/'
Then run
$ cd ../figure8/ # Navigate to the experiment folder
$ chmod +x create_fig8.sh # Make the script executable, if needed
$ ./create_fig8.sh # Run the script (ca. 1')
This will use the unsafe (pre-trained) PPO controller/agent in folder safe-control-gym/experiments/figure8/unsafe_ppo_model/
to generate
To also re-train the unsafe PPO controller/agent (ca. 2' on a laptop)
$ chmod +x create_unsafe_ppo_model.sh # Make the script executable, if needed
$ ./create_unsafe_ppo_model.sh # Run the script (ca. 2')
Note: this script will (over)write the model in
safe-control-gym/experiments/figure8/unsafe_ppo_model/
- [1] Hewing L, Kabzan J, Zeilinger MN. 2020. Cautious model predictive control using Gaussian process regression. IEEE Transactions on Control Systems Technology 28:2736–2743
- [2] Dalal G, Dvijotham K, Vecerik M, Hester T, Paduraru C, Tassa Y. 2018. Safe exploration in continuous action spaces. arXiv:1801.08757 [cs.AI]
- [3] Wabersich KP, Zeilinger MN. 2018. Linear Model Predictive Safety Certification for Learning-Based Control. In 2018 IEEE Conference on Decision and Control (CDC), pp. 7130–7135
gym-pybullet-drones
: single and multi-quadrotor environmentsgym-marl-reconnaissance
: multi-agent heterogeneous (UAV/UGV) environmentsstable-baselines3
: PyTorch reinforcement learning algorithmsbullet3
: multi-physics simulation enginegym
: OpenAI reinforcement learning toolkitsafety-gym
: environments for safe exploration in RLrealworldrl_suite
: real-world RL challenge frameworkcasadi
: symbolic framework for numeric optimization
- Publish to PyPI
- Create resource list with papers, projects, blog posts (Cat's, etc.) using
safe-control-gym
University of Toronto's Dynamic Systems Lab / Vector Institute for Artificial Intelligence