An OpenAI Gym toolkit for continuous control with Bayesian Actor-critic reinforcement learning.
Run sample/mountain_car_v0_no_jupyter.py
^^Notice - I am working on CUDA accelerated branch. I will update it here ASAP.
- NumPy, SciPy
- OpenAI Gym gym.py (no Mujoco yet)
- Pandas, matplotlib
- CUDA Toolkit 11.3 (for gpu-accelerated branch)
- CuPy for CUDA 11.3 (for gpu-accelerated branch)
- At least Intel Core i3 3rd Gen (~ 1 hour simulation time for 500 BAC updates)
- At least 4 GB DDR3 RAM
- (only for GPU branch) Dedicated Nvidia GPU with Compute Capability > 3.0 (https://developer.nvidia.com/cuda-gpus)
We see that it smoothly achieves the goal. Since this is continuous control, action_space = [-1.0, 1.0]
. The agents above is more inclined to take action ~= 1.0
. Running the sim for higher BAC updates would probably see the agent figure out how to take action ~= -1.0
once it is up-slope towards the GOAL. Currently, the sim is processor heavy, thus slow. I am working on CUDA acceleration to speed up the NumPy and SciPy operations.
- Ghavamzadeh, Mohammad, Yaakov Engel, and Michal Valko. "Bayesian policy gradient and actor-critic algorithms." The Journal of Machine Learning Research 17.1 (2016): 2319-2371. Main ref
- Ghavamzadeh, Mohammad, and Yaakov Engel. "Bayesian actor-critic algorithms." Proceedings of the 24th international conference on Machine learning. 2007.
- Ciosek, Kamil, et al. "Better exploration with optimistic actor-critic." arXiv preprint arXiv:1910.12807 (2019).
- Ghavamzadeh, Mohammad, et al. "Bayesian reinforcement learning: A survey." arXiv preprint arXiv:1609.04436 (2016).
- Kurenkov, Andrey, et al. "Ac-teach: A bayesian actor-critic method for policy learning with an ensemble of suboptimal teachers." arXiv preprint arXiv:1909.04121 (2019).
- Bhatnagar, Shalabh, et al. "Natural actor–critic algorithms." Automatica 45.11 (2009): 2471-2482.