The official code for Efficient and Stable Offline-to-online Reinforcement Learning via Continual Policy Revitalization, (IJCAI'24).
The training dependencies could be installed by the following command with conda. Notice that since we provide the same package in our training, it might be possible that the installed version of CUDA is not compatible with your GPU. In that case, you can mannually reinstall pytorch only.
conda env create -f environment.yml
To install the D4RL benchmark, try the following command
git clone https://github.com/Farama-Foundation/D4RL.git
cd d4rl
pip install -e .
If you do not want to use wandb for tracking, you can run the following command in your terminal
wandb offline
Otherwise, you can fill the wandb account setting in scripts/config.sh
export PYTHONPATH=".":$PYTHONPATH
wandb_online="False"
entity=""
if [ ${wandb_online} == "True" ]; then
export WANDB_API_KEY=""
export WANDB_MODE="online"
else
wandb offline
fi
Run the following script to finish the offline experiments
bash ./script/run_td3bc_offline.sh tasktask quality namename seed --device $device_id
Value for the arguments
- task: halfcheetah, hopper, walker2d, all
- quality: medium, medium-replay, medium-expert, random
- name: original(paper args), corl(CORL args, recommended)
- seed: random seed
- device_id: cuda device ID
One example command is
bash ./script/run_td3bc_offline.sh halfcheetah medium corl 0 --device "cuda:0"
Notice: Online training is only possible after the corresponding offline training checkpoint is produced.
Run the following script to reproduce online experiments
bash ./script/run_cpr_online.sh tasktask quality original seed−−deviceseed --device device_id
Value for the arguments
- task: halfcheetah, hopper, walker2d, all
- quality: medium, medium-replay, medium-expert, random
- seed: random seed
- device_id: cuda device ID
One example command is
bash ./script/run_cpr_online.sh halfcheetah medium original 0 --device "cuda:0"
The logs and models are stored in "./out" folder.
tensorboard --logdir="./out"
We thank the following repos for the help:
- OfflineRL-Lib provides the framework and implementation of most baselines.
- CORL provides finetuned hyper-parameters.
If you find this work useful for your research, you can cite with the following bib:
@inproceedings{
cpr,
title={Efficient and Stable Offline-to-online Reinforcement Learning via Continual Policy Revitalization},
author={Rui Kong, Chenyang Wu, Chen-Xiao Gao, Zongzhang Zhang and Ming Li},
booktitle={Proceedings of the Thirty-Third International Joint Conference on
Artificial Intelligence, {IJCAI} 2024},
year={2024},
}