Skip to content

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL

License

Notifications You must be signed in to change notification settings

middleyuan/safe-control-gym

 
 

Repository files navigation

Reducing Maximization Bias and Risk in Hyperparameter Optimization for Reinforcement Learning and Learning-Based Control

This is the code for the paper entitiled Reducing Maximization Bias and Risk in Hyperparameter Optimization for Reinforcement Learning and Learning-Based Control. The implementation is adapted and based on Safe-Control-Gym.

Install on Ubuntu

Create a conda environment

Create and access a Python 3.10 environment using conda

conda create -n pr-env python=3.10
conda activate pr-env

Install

pip install --upgrade pip
pip install -e .

Note

You may need to separately install gmp, a dependency of pycddlib:

conda install -c anaconda gmp

or

sudo apt-get install libgmp-dev

To perform hyperparmeter optimization, you may need MySQL database:

sudo apt-get install mysql-server

To set up, run the following commands sequencially:

sudo mysql
CREATE USER optuna@"%";
CREATE DATABASE {algo}_hpo;
GRANT ALL ON {algo}_hpo.* TO optuna@"%";
exit

You may replace {algo} with gp_mpc, ppo, sac, or ddpg in order to run the scripts.

Toy Examples

The results for toy examples in the paper can be reproduced in toy_example.ipynb

Reinforcement Learning

To run hyperparameter optimization (HPO) for DDPG, run:

bash experiments/comparisons/rl/main.sh hostx TPESampler ddpg cartpole stab False

To run hyperparameter optimization (HPO) for PPO, run:

bash experiments/comparisons/rl/main.sh hostx TPESampler ppo cartpole stab False

To run hyperparameter optimization (HPO) for SAC, run:

bash experiments/comparisons/rl/main.sh hostx TPESampler sac cartpole stab False

Learning-Based Control

To run hyperparameter optimization (HPO) for GP-MPC, run:

bash experiments/comparisons/gpmpc/main.sh hostx TPESampler cartpole stab False

Note

You may need to adjust the path of conda.sh in the sub-scripts called by main.sh such as rl_hpo_strategy_eval.sh.

About

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 86.3%
  • Python 13.2%
  • Shell 0.5%