This repository provides a clean implementation of the Randomized Ensembled Double Q-learning (REDQ) [paper] algorithm, a model-free reinforcement learning algorithm that achieves high sample efficiency in continuous action space domains. The implementation is compatible with OpenAI Gym environments and includes comparison benchmarks against Soft Actor-Critic (SAC).
REDQ is an enhancement over traditional off-policy algorithms, employing three key mechanisms:
- High Update-To-Data (UTD) ratio for improved sample efficiency
- Ensemble of Q-functions to reduce variance in Q-function estimates
- In-target minimization to reduce over-estimation bias
The pseudocode of the REDQ algorithm is as follows:
The main hyperparameters of the REDQ algorithm are:
- G: Number of gradient steps per interaction
- N: Number of critic networks
- M: Size of the random subset over N critics
For the final conclusions see the report here. Here you can see a summary of the results:
In this example, we train the REDQ algorithm on the LunarLander-v2 environment with the following hyperparameters: N=5, G=5, M=2
python main.py --kwargs N=5 G=5 M=2 alpha=0.05 --exp_name REDQ_alpha0.05_N5_G5_M2 --total_timesteps 200000 --seed 1 --env LunarLander-v2
Furthermore, since Soft-Actor Critic (SAC) is subset of REDQ, we can also train the SAC algorithm with the following hyperparameters: N=2, G=1, M=2
python main.py --kwargs N=2 G=1 M=2 alpha=0.2 --exp_name SAC_alpha0.2 --total_timesteps 200000 --seed 42 --env LunarLander-v2
See the file run_experiments.sh
for more examples.
If you want to plot the results see the notebook show_plots.ipynb
.
conda env create -f environment.yml
conda activate RL
- Chen, X., et al. (2021). "Randomized ensembled double q-learning: Learning fast without a model". arXiv preprint arXiv:2101.05982.
- Haarnoja, T., et al. (2018). "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor".
This repository was part of the course ATCI at UPC Barcelona. Authors:
- Lukas Meggle
- Alberto Maté