Change adversarial algorithms to collect rollouts first #731

taufeeque9 · 2023-06-17T18:26:25Z

Description

This PR changes the adversarial algorithm such that the at any iteration, the rollouts are collected first, and then the discriminator is trained, followed by training the generator. This modification matches Algorithm 1 given in the AIRL paper.

Testing

The proposed change improves the returns obtained on many environments. The table below shows the imitation-to-expert return ratio of the algorithms on several environments. The results were obtained by tuning the hyperparameters for each environment separately. The return ratio was obtained by evaluating the tuned hyperparameters on five distinct seeds and calculating the average return ratio to the expert's return.

Algo \ Env	Ant	Half Cheetah	Hopper	Swimmer	Walker
GAIL-PR	0.883	0.868	1.01	0.986	0.989
AIRL-PR	-0.04	0.993	1.01	0.926	0.270
GAIL-Master	0.864	0.981	1.004	0.945	0.893
AIRL-Master	0.259	0.447	1.008	0.663	0.176

ernestum · 2023-06-19T08:45:51Z

Thanks a lot @taufeeque9 for adding this change. We will need it for #675 !

For my understanding: does the table show comparisons to the previous version of the implementation?

taufeeque9 · 2023-06-19T09:40:19Z

The table shows comparisons with the current version of the algorithm implemented on the master branch, which hasn't been updated since I last computed the results. -PR indicates the modified algorithm implemented in this PR and -Master indicates the algorithm implemented currently on the master branch.

taufeeque9 added 10 commits September 26, 2022 19:02

Add high level changes to the algorithm

d4159f5

Add minor changes

8623e36

Add hacky workaround to implement reference paper's adversarial algo

34b52ff

Merge branch 'master' into adversarial-mod

3af2f23

Merge branch 'master' into adversarial-mod

2eedf92

Merge branch 'master' into adversarial-mod

d3ebd3d

Fix bug and add support for Off Policy RL

e783e2f

Merge branch 'master' into adversarial-mod

8881008

Add changes

4008b62

Add changes for onpolicy & offpolicy

9e4fdd4

taufeeque9 mentioned this pull request Aug 10, 2023

Adversarial algorithm matching original paper's implementation #770

Draft

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change adversarial algorithms to collect rollouts first #731

Change adversarial algorithms to collect rollouts first #731

taufeeque9 commented Jun 17, 2023

ernestum commented Jun 19, 2023

taufeeque9 commented Jun 19, 2023

Change adversarial algorithms to collect rollouts first #731

Are you sure you want to change the base?

Change adversarial algorithms to collect rollouts first #731

Conversation

taufeeque9 commented Jun 17, 2023

Description

Testing

ernestum commented Jun 19, 2023

taufeeque9 commented Jun 19, 2023