Reducing Maximization Bias and Risk in Hyperparameter Optimization for Reinforcement Learning and Learning-Based Control
This is the code for the paper entitiled Reducing Maximization Bias and Risk in Hyperparameter Optimization for Reinforcement Learning and Learning-Based Control
. The implementation is adapted and based on Safe-Control-Gym.
Create and access a Python 3.10 environment using
conda
conda create -n pr-env python=3.10
conda activate pr-env
pip install --upgrade pip
pip install -e .
You may need to separately install gmp
, a dependency of pycddlib
:
conda install -c anaconda gmp
or
sudo apt-get install libgmp-dev
To perform hyperparmeter optimization, you may need MySQL
database:
sudo apt-get install mysql-server
To set up, run the following commands sequencially:
sudo mysql
CREATE USER optuna@"%";
CREATE DATABASE {algo}_hpo;
GRANT ALL ON {algo}_hpo.* TO optuna@"%";
exit
You may replace {algo}
with gp_mpc
, ppo
, sac
, or ddpg
in order to run the scripts.
The results for toy examples in the paper can be reproduced in toy_example.ipynb
To run hyperparameter optimization (HPO) for DDPG, run:
bash experiments/comparisons/rl/main.sh hostx TPESampler ddpg cartpole stab False
To run hyperparameter optimization (HPO) for PPO, run:
bash experiments/comparisons/rl/main.sh hostx TPESampler ppo cartpole stab False
To run hyperparameter optimization (HPO) for SAC, run:
bash experiments/comparisons/rl/main.sh hostx TPESampler sac cartpole stab False
To run hyperparameter optimization (HPO) for GP-MPC, run:
bash experiments/comparisons/gpmpc/main.sh hostx TPESampler cartpole stab False
You may need to adjust the path
of conda.sh
in the sub-scripts called by main.sh
such as rl_hpo_strategy_eval.sh
.