Skip to content

Latest commit

 

History

History
38 lines (27 loc) · 835 Bytes

README.md

File metadata and controls

38 lines (27 loc) · 835 Bytes

PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning

How to run the code

Install dependencies

These are the same setup instructions as in Implicit Q-Learning.

pip install --upgrade pip

pip install -r requirements.txt

# Installs the wheel compatible with Cuda 11 and cudnn 8.
pip install --upgrade "jax[cuda]>=0.2.27" -f https://storage.googleapis.com/jax-releases/jax_releases.html

Also, see other configurations for CUDA here.

Example training code

Locomotion

bash 2online_mujoco.sh
bash 2online_mujoco_td3.sh

AntMaze

bash 2online_antmaze.sh
bash 2online_antmaze_td3.sh

Adroit

bash 2online_adroit.sh
bash 2online_adroit_td3.sh