Off Policy Adversarial Inverse Reinforcement Learning (off-policy-AIRL)

Source code to accompany Off Policy Adversarial Inverse Reinforcement Learning.

If you use this code for your research, please consider citing the paper:

@article{arnob2020off,
  title={Off-Policy Adversarial Inverse Reinforcement Learning},
  author={Arnob, Samin Yeasar},
  journal={arXiv preprint arXiv:2005.01138},
  year={2020}
}

To run Custom environments

Folder required

*`inverse_rl (from :https://github.com/justinjfu/inverse_rl)
* rllab
* sandbox`

Library requires

* rllab (https://github.com/openai/rllab)
* PyTorch
* Python 2 
* mjpro131 
* pip install mujoco-py==0.5.7

To run MuJoCo environments

Library requires

* PyTorch
* Python 3
* mujoco-py==1.50.1.68

Download saved data

Expert trajectory

Compute Imitation performance:

python Train.py --seed 0 \
                --env_name "HalfCheetah-v2" \
                --learn_temperature \
                --policy_name "SAC"

Description of different arguments are following:

Enviroment options: `
- OpenAI gym: HalfCheetah-v2, Ant-v2, Hopper-v2, Walker2d-v2
- Custom environments CustomAnt-v0, PointMazeLeft-v0
learn_temperature:
- allows the temperature parameter of SAC to be a learning parameter
Policy options SAC, SAC_MCP(k=8 premitive policies), SAC_MCP2 (k=4 premitive policies)

Compute Transfer Learning:

Transfer learning experiment is computed on Custom environment from (https://github.com/justinjfu/inverse_rl/tree/master/inverse_rl)

python ReTrain.py --seed 0 
                --env_name "DisabledAnt-v0" \
                --learn_temperature \
                --policy_name "SAC" \
                --initial_state "random"  \
                --initial_runs "policy_sample"\
                --load_gating_func\
                --learn_actor

Description of different arguments are following:

Enviroment options: `
- Custom environments DisabledAnt-v0, PointMazeRight-v0
learn_temperature:
- allows the temperature parameter of SAC to be a learning parameter
Policy options SAC, SAC_MCP(k=8 premitive policies), SAC_MCP2 (k=4 premitive policies)
initial_state
- zero environment starts from same state
- random environment starts from random states
```
  --initial_runs "policy_sample"\
```
load_gating_func
- applicable only for SAC_MCP and SAC_MCP2
- if flagged, loads gating function from imitation training
- if not flagged, random initialization of the gating function
learn_actor
- applicable only for SAC_MCP and SAC_MCP2
- if flagged, retrains policy and gating function
- if not flagged, retrain only gating function

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Policies		Policies
Discriminator.py		Discriminator.py
LICENSE		LICENSE
README.md		README.md
ReTrain.py		ReTrain.py
Train.py		Train.py
data.py		data.py
logger.py		logger.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Off Policy Adversarial Inverse Reinforcement Learning (off-policy-AIRL)

To run Custom environments

Folder required

Library requires

To run MuJoCo environments

Library requires

Download saved data

Compute Imitation performance:

Compute Transfer Learning:

Transfer learning experiment is computed on Custom environment from (https://github.com/justinjfu/inverse_rl/tree/master/inverse_rl)

About

Releases

Packages

Languages

License

SaminYeasar/Off_Policy_Adversarial_Inverse_Reinforcement_Learning

Folders and files

Latest commit

History

Repository files navigation

Off Policy Adversarial Inverse Reinforcement Learning (off-policy-AIRL)

To run Custom environments

Folder required

Library requires

To run MuJoCo environments

Library requires

Download saved data

Compute Imitation performance:

Compute Transfer Learning:

Transfer learning experiment is computed on Custom environment from (https://github.com/justinjfu/inverse_rl/tree/master/inverse_rl)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages