Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change adversarial algorithms to collect rollouts first #731

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

taufeeque9
Copy link
Collaborator

Description

This PR changes the adversarial algorithm such that the at any iteration, the rollouts are collected first, and then the discriminator is trained, followed by training the generator. This modification matches Algorithm 1 given in the AIRL paper.

Testing

The proposed change improves the returns obtained on many environments. The table below shows the imitation-to-expert return ratio of the algorithms on several environments. The results were obtained by tuning the hyperparameters for each environment separately. The return ratio was obtained by evaluating the tuned hyperparameters on five distinct seeds and calculating the average return ratio to the expert's return.

Algo \ Env Ant Half Cheetah Hopper Swimmer Walker
GAIL-PR 0.883 0.868 1.01 0.986 0.989
AIRL-PR -0.04 0.993 1.01 0.926 0.270
GAIL-Master 0.864 0.981 1.004 0.945 0.893
AIRL-Master 0.259 0.447 1.008 0.663 0.176

@ernestum
Copy link
Collaborator

Thanks a lot @taufeeque9 for adding this change. We will need it for #675 !

For my understanding: does the table show comparisons to the previous version of the implementation?

@taufeeque9
Copy link
Collaborator Author

The table shows comparisons with the current version of the algorithm implemented on the master branch, which hasn't been updated since I last computed the results. -PR indicates the modified algorithm implemented in this PR and -Master indicates the algorithm implemented currently on the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants