Imitation Learning via Reinforcement Learning (ILRL)

This project uses policy gradient methods such as PPO or TRPO along with Generative Adversarial Networks to achieve Imitation Learning on discrete gym environments.

Methodology used here is explained in Generative Adversarial Imitation Learning (GAIL) [paper]

Gist of it:

Given an Expert Policy as input the GAIL algorithm uses Policy Gradient method like PPO (in this case) to achieve Imitation Learning and in most cases the learned policy gets better than the input Expert Policy.

For more information why we choose this methodology over other algorithms read the report: GAIL to solve Discrete environments

Overview of steps:

Run PPO algorithm - run the PPO algorithm on an environment
1. Create Actor-Critic architecture which represents the two policy networks
2. Code the PPO algorithm
3. Train an agent using the PPO algorithm
Sample trajectories - Sample some trajectories which represents the Expert Policy which we later use to train our agent for Imitation learning
1. Restore the agent policy network weights
2. Sample some state and action using the expert policy
3. Save the sampled states and actions into csv files
Test Expert Policy - Test the learned expert policy to see if it satisfies the criteria for solving the environment (render the runs if you want)
Train agent using GAIL for imitation learning - given the expert trajectories as input we use Generative Adversarial Imitation Learning to train the agent
1. Create a Discriminator that differentiates between the Expert Policy and Generated Policy (same as in a conventional Generative Adversarial Network)
2. Train the agent to learn by imitating the given expert policy (uses GAIL algorithm)
Run Baseline implementations of PPO and TRPO to compare performance with our implementations
Observe reward plots on Tensorboard - the tensorboard contains the following plots :-
1. Our PPO implementation's Rewards and Lengths
2. Expert Policy Testing plot
3. GAIL reward and lengths plot (final agent)
4. Baseline reward, length and loss plots for comparison

Note - We can use any algorithm to obtain expert policy for GAIL agent training. Also, we can use other policy gradient methods like TRPO in place of PPO in the GAIL algorithm to obtain our imitating agent. However, the performance may vary depending on the algorithm choosen.

Dependencies:

Tensorflow (faster if you have GPU support enabled)
OpenAI gym
numpy

Instructions to run:

Run Jupyter notebook or Jupyter lab
Open GAIL.ipynb file
Follow the instructions in the notebook to run the project
Follow the instructions in the notebook to generate and observe the plots on Tensorboard

Results on CartPole-v0 environment:

Our PPO implementation Rewards

GAIL learned agent Rewards

Baseline PPO rewards

Baselines TRPO Rewards

References:

Generative Adversarial Imitation Learning [paper]
OpenAI baselines GAIL
Tensorflow implementation of Generative Adversarial Imitation Learning(GAIL) with discrete action
Simple GAIL implementation using Tensorflow

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Report		Report
log		log
plots		plots
trained_model		trained_model
trajectories		trajectories
GAIL.ipynb		GAIL.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Imitation Learning via Reinforcement Learning (ILRL)

Gist of it:

Overview of steps:

Dependencies:

Instructions to run:

Results on CartPole-v0 environment:

Our PPO implementation Rewards

GAIL learned agent Rewards

Baseline PPO rewards

Baselines TRPO Rewards

References:

About

Releases

Packages

Languages

License

prasad-madhale/imitation-learning

Folders and files

Latest commit

History

Repository files navigation

Imitation Learning via Reinforcement Learning (ILRL)

Gist of it:

Overview of steps:

Dependencies:

Instructions to run:

Results on CartPole-v0 environment:

Our PPO implementation Rewards

GAIL learned agent Rewards

Baseline PPO rewards

Baselines TRPO Rewards

References:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages