Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples or resources for autodiff with 2 networks? #880

Open
Cobrand opened this issue Oct 30, 2023 · 0 comments
Open

Examples or resources for autodiff with 2 networks? #880

Cobrand opened this issue Oct 30, 2023 · 0 comments

Comments

@Cobrand
Copy link

Cobrand commented Oct 30, 2023

I'm having a look at this example (because I'm trying to implement PPO as well), as well as others, and I can't figure out a way to have 2 inter-dependent networks for autodiff.

In the case of PPO, the formula is basically loss = pg_loss - args.ent_coef * entropy_loss + v_loss * args.vf_coef (https://github.com/vwxyzjn/ppo-implementation-details/blob/main/ppo.py#L297).

Entropy loss and value loss I can handle just fine, but pg_loss depends on ratio of post-softmax probabilities, which comes from the policy network AND advantages, which depend on the value network. (Python impl: pg_loss = torch.max(-mb_advantages * ratio, -mb_advantages * torch.clamp(ratio, 1 - args.clip_coef, 1 + args.clip_coef)).mean() )

I have 2 separate networks for those, is there any way to have the autodiff work only regarding the Value network on one side, and only regarding the Policy network on the other side? I know that tensorflow can do that somehow, but I don't know about this lib.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant