This is an official GitHub Repository for the following paper:
- Dohyeong Kim, Kyungjae Lee, and Songhwai Oh, "Trust Region-Based Safe Distributional Reinforcement Learning for Multiple Constraints," in Proc. of Neural Information Processing Systems (NeurIPS), Dec. 2023.
stable-baselines3
causes the torch
installation to be incorrect, we recommend to install stable-baselines3
and sb3-contrib
first, and then torch
.
- python 3.8 or greater
- stable-baselines3==1.8.0
- sb3-contrib==1.8.0
- torch==1.12.1
- wandb (Optional, just for logging)
- scipy
- qpsolvers==1.9.0
- opencv-python
- tensorflow-gpu==2.5.0 (Optional, for
OffTRC
,CPO
, andWCSAC
) - tensorflow-probability==0.12.2 (Optional, for
OffTRC
,CPO
, andWCSAC
) - tqdm (Optional, for
CVPO
) - tensorboardX>=2.4 (Optional, for
CVPO
) - cpprb==10.1.1 (Optional, for
CVPO
) - mpi4py (Optional, for
WCSAC
) - numpy==1.22
- Install
mujoco-py
:- You can refer to here.
- Install
safety-gym
:- The official repository has some issues, so we recommend to install it as follows.
-
mv {sdac}/installation/safety-gym pip install -e .
- The official repository supports only
tensorflow 1.XX
, so to usetensorflow 2.XX
, we recommend to install it as follows. -
mv {sdac}/installation/WCSAC pip install -e .
Safety Gym
Safexp-PointGoal1-v0
Safexp-CarGoal1-v0
Safexp-PointButton3-v0
(defined insafety_gym/utils/register.py
)Safexp-CarButton3-v0
(defined insafety_gym/utils/register.py
)
Locomotion
MITCheetah-v0
andMITCheetah-v1
(defined inlocomotion/utils/register.py
)Laikago-v0
andLaikago-v1
(defined inlocomotion/utils/register.py
)Cassie-v0
andCassie-v1
(defined inlocomotion/utils/register.py
)
-
SDAC
- The constraint conservativeness
$\alpha$ can be set by modifying the part corresponding to--cost_alpha {float_number}
in each shell file. -
# for train cd {sdac}/safety_gym/sdac bash train/{env_name}.sh # env_name: point_goal, point_button, car_goal, car_button.
-
# for test cd {sdac}/safety_gym/sdac bash test/{env_name}.sh # env_name: point_goal, point_button, car_goal, car_button.
- The constraint conservativeness
-
OffTRC
andCPO
- The constraint conservativeness
$\alpha$ forOffTRC
can be set by modifying the part corresponding to--cost_alpha {float_number}
in each shell file (ForCPO
,$\alpha$ should be fixed at$1.0$ ). - The source code is from https://github.com/rllab-snu/Off-Policy-TRC.
-
# for train cd {sdac}/safety_gym/offtrc bash train/{algo_name}_{env_name}.sh # algo_name: offtrc, cpo.
-
# for test cd {sdac}/safety_gym/offtrc bash test/{algo_name}_{env_name}.sh # algo_name: offtrc, cpo.
- The constraint conservativeness
-
CVPO
- The source code is from https://github.com/liuzuxin/cvpo-safe-rl.
- See
{cvpo}/safety_gym/cvpo/README.md
for detailed configuration information. -
# for train cd {sdac}/safety_gym/cvpo bash train/{env_name}.sh
-
WCSAC
- The source code is from https://github.com/AlgTUDelft/WCSAC.
- The constraint conservativeness
$\alpha$ can be set by modifying the part corresponding to--cl {float_number}
in each shell file. -
# for train cd {sdac}/safety_gym/cvpo bash train/{env_name}.sh
SDAC
,WCSAC
, andOffTRC
-
# for train cd {sdac}/locomotion/{algo_name} # algo_name: sdac, wcsac, offtrc bash train/{env_name}.sh # env_name: cheetah, laikago, cassie
-
# for test cd {sdac}/locomotion/{algo_name} # algo_name: sdac, wcsac, offtrc bash test/{env_name}.sh # env_name: cheetah, laikago, cassie
-
All algorithms leave log files using {sdac}/safety_gym/utils/logger.py
.
To draw graph using the log files, you can run visualize.py
in each algorithm directory.
For example, WCSAC
:
cd {sdac}/safety_gym/wcsac
python integrate.py
python visualize.py
SDAC
, CPO
, OffTRC
, and CVPO
:
cd {sdac}/safety_gym/{algo_name}
python visualize.py
After run the python file, the figure file will be saved in the imgs
folder.
In the visualize.py
, you can modify the path of where the logs are saved.
Distributed under the MIT License. See LICENSE
for more information.