The OmniSafe Mujoco Velocity Benchmark assesses the efficacy of OmniSafe's SafeRL algorithms in six environments from the Safety-Gymnasium task suite. For each supported algorithm and environment, we offer default hyperparameters utilized during the benchmark, as well as scripts to replicate the results. Additionally, we provide performance comparisons and code-level details with other open-source implementations or classic papers. Our package includes graphs and raw data that can be used for research purposes, along with log details from training. Finally, we offer hints on fine-tuning the algorithm for optimal results.
First-Order
- [NIPS 1999] Policy Gradient(PG)
- Proximal Policy Optimization (PPO)
- The Lagrange version of PPO (PPO-Lag)
- [IJCAI 2022] Penalized Proximal Policy Optimization for Safe Reinforcement Learning(P3O)
- [NeurIPS 2020] First Order Constrained Optimization in Policy Space (FOCOPS)
- [NeurIPS 2022] Constrained Update Projection Approach to Safe Policy Optimization (CUP)
Second-Order
- [NeurIPS 2001] A Natural Policy Gradient (NaturalPG))
- [PMLR 2015] Trust Region Policy Optimization (TRPO)
- The Lagrange version of TRPO (TRPO-Lag)
- [ICML 2017] Constrained Policy Optimization (CPO)
- [ICML 2017] Proximal Constrained Policy Optimization (PCPO)
- [ICLR 2019] Reward Constrained Policy Optimization (RCPO)
More details can be refer to On Policy Experiment.
More details can be refer to Off Policy Experiment.