GitHub - ZhefeiGong/Reward-Induced_RL

REWARD-INDUCED REPRESENTATION LEARNING

The implementation code of paper, based on the starter repo provided by CLVR.

1.OVERVIEW

Agent(●) Following Target(■) RL task
- 0 distractor(▲)
- 1 distractor(▲)
introduction

reward-induced representation model

2.Get Start

initialize a virtual environment

conda create --name myenv python=3.8
conda activate myenv

configure environment dependencies

pip install -r requirements.txt

3.Implementation Tasks

re-implement the reward-induced representation learning model

MODEL = Encoder + MLPs + LSTM + rewards_heads(MLPs)

re-plicate the experiment to show the representation ability of model

EXP = Encoder(reward-induced) + Detached Decoder

implement PPO to finished the downstream task (agent following target) with reward-induced representation

build several representation models as baselines to train downstream task with PPO

train all of the above representation models and verify the better performance of reward-induced model

4. Files Summary

/presentation : the presentation slides of the implementation task
/re_implement_paper : the notes about the paper
/scripts : the shell files to run the tasks
/sprites_datagen : the dataset
/sprites_env : the data environment
/src : the src for README
/tmp : the visualization results during training process
/weights : the weights of models
baseline.py : the baseline models for final training
general_utils.py : general tool function
model.py : the pre-trained representation learning models
ppo_train.py : train the whole task with PPO
ppo.py : PPO implementation
pre_train.py : pre-train the representation learning models
README.md : the project info
requirement.txt : environmental dependencies

5. Run

provide a shell file to each task in /scripts

pre-train representation models
- pretrain_image_recon_decoder.sh
- pretrain_image_recon_model.sh
- pretrain_reward_pred_model.sh
train the downsteam task with ppo
- ppotrain_cnn.sh
- ppotrain_image_rec_finetune.sh
- ppotrain_image_rec.sh
- ppotrain_image_scratch.sh
- ppotrain_oracle.sh
- ppotrain_reward_pred_finetune.sh
- ppotrain_reward_pred.sh

change the parameters in each shell file like whether using gpus(gpus_num) or whether using wandb to record(is_use_wandb)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

REWARD-INDUCED REPRESENTATION LEARNING

1.OVERVIEW

2.Get Start

3.Implementation Tasks

4. Files Summary

5. Run

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
presentation		presentation
re_implement_paper		re_implement_paper
scripts		scripts
sprites_datagen		sprites_datagen
sprites_env		sprites_env
src		src
tmp		tmp
weights		weights
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
baseline.py		baseline.py
general_utils.py		general_utils.py
model.py		model.py
ppo.py		ppo.py
ppo_train.py		ppo_train.py
pre_train.py		pre_train.py
requirements.txt		requirements.txt

License

ZhefeiGong/Reward-Induced_RL

Folders and files

Latest commit

History

Repository files navigation

REWARD-INDUCED REPRESENTATION LEARNING

1.OVERVIEW

2.Get Start

3.Implementation Tasks

4. Files Summary

5. Run

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages