PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning

How to run the code

Install dependencies

These are the same setup instructions as in Implicit Q-Learning.

pip install --upgrade pip

pip install -r requirements.txt

# Installs the wheel compatible with Cuda 11 and cudnn 8.
pip install --upgrade "jax[cuda]>=0.2.27" -f https://storage.googleapis.com/jax-releases/jax_releases.html

Also, see other configurations for CUDA here.

Example training code

Locomotion

bash 2online_mujoco.sh
bash 2online_mujoco_td3.sh

AntMaze

bash 2online_antmaze.sh
bash 2online_antmaze_td3.sh

Adroit

bash 2online_adroit.sh
bash 2online_adroit_td3.sh

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
configs		configs
reproduce		reproduce
tmp		tmp
wrappers		wrappers
2online_adroit.sh		2online_adroit.sh
2online_adroit_td3.sh		2online_adroit_td3.sh
2online_antmaze.sh		2online_antmaze.sh
2online_antmaze_td3.sh		2online_antmaze_td3.sh
2online_mujoco.sh		2online_mujoco.sh
2online_mujoco_td3.sh		2online_mujoco_td3.sh
README.md		README.md
actor.py		actor.py
actor_td3.py		actor_td3.py
common.py		common.py
create_gif.py		create_gif.py
critic.py		critic.py
critic_td3.py		critic_td3.py
dataset_utils.py		dataset_utils.py
ensemble.py		ensemble.py
evaluation.py		evaluation.py
evaluation_td3.py		evaluation_td3.py
learner.py		learner.py
learner_td3.py		learner_td3.py
policy.py		policy.py
requirements.txt		requirements.txt
test.py		test.py
train_deploy.py		train_deploy.py
train_finetune.py		train_finetune.py
train_finetune_td3.py		train_finetune_td3.py
value_net.py		value_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning

How to run the code

Install dependencies

Example training code

About

Releases

Packages

Languages

AIR-DI/PROTO

Folders and files

Latest commit

History

Repository files navigation

PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning

How to run the code

Install dependencies

Example training code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages