Skip to content

SaminYeasar/Off_Policy_Adversarial_Inverse_Reinforcement_Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Off Policy Adversarial Inverse Reinforcement Learning (off-policy-AIRL)

Source code to accompany Off Policy Adversarial Inverse Reinforcement Learning.

If you use this code for your research, please consider citing the paper:

@article{arnob2020off,
  title={Off-Policy Adversarial Inverse Reinforcement Learning},
  author={Arnob, Samin Yeasar},
  journal={arXiv preprint arXiv:2005.01138},
  year={2020}
}

To run Custom environments

Folder required

*`inverse_rl (from :https://github.com/justinjfu/inverse_rl)
* rllab
* sandbox`

Library requires

* rllab (https://github.com/openai/rllab)
* PyTorch
* Python 2 
* mjpro131 
* pip install mujoco-py==0.5.7 

To run MuJoCo environments

Library requires

* PyTorch
* Python 3
* mujoco-py==1.50.1.68 

Download saved data


Compute Imitation performance:

python Train.py --seed 0 \
                --env_name "HalfCheetah-v2" \
                --learn_temperature \
                --policy_name "SAC"

Description of different arguments are following:

  • Enviroment options: `
    • OpenAI gym: HalfCheetah-v2, Ant-v2, Hopper-v2, Walker2d-v2
    • Custom environments CustomAnt-v0, PointMazeLeft-v0
  • learn_temperature:
    • allows the temperature parameter of SAC to be a learning parameter
  • Policy options SAC, SAC_MCP(k=8 premitive policies), SAC_MCP2 (k=4 premitive policies)


Compute Transfer Learning:

Transfer learning experiment is computed on Custom environment from (https://github.com/justinjfu/inverse_rl/tree/master/inverse_rl)
python ReTrain.py --seed 0 
                --env_name "DisabledAnt-v0" \
                --learn_temperature \
                --policy_name "SAC" \
                --initial_state "random"  \
                --initial_runs "policy_sample"\
                --load_gating_func\
                --learn_actor 

Description of different arguments are following:

  • Enviroment options: `

    • Custom environments DisabledAnt-v0, PointMazeRight-v0
  • learn_temperature:

    • allows the temperature parameter of SAC to be a learning parameter
  • Policy options SAC, SAC_MCP(k=8 premitive policies), SAC_MCP2 (k=4 premitive policies)

  • initial_state

    • zero environment starts from same state

    • random environment starts from random states

        --initial_runs "policy_sample"\
      
  • load_gating_func

    • applicable only for SAC_MCP and SAC_MCP2
    • if flagged, loads gating function from imitation training
    • if not flagged, random initialization of the gating function
  • learn_actor

    • applicable only for SAC_MCP and SAC_MCP2
    • if flagged, retrains policy and gating function
    • if not flagged, retrain only gating function

About

Implementation of Off Policy Adversarial Inverse Reinforcement Learning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages