Examples or resources for autodiff with 2 networks? #880

Cobrand · 2023-10-30T18:52:04Z

I'm having a look at this example (because I'm trying to implement PPO as well), as well as others, and I can't figure out a way to have 2 inter-dependent networks for autodiff.

In the case of PPO, the formula is basically loss = pg_loss - args.ent_coef * entropy_loss + v_loss * args.vf_coef (https://github.com/vwxyzjn/ppo-implementation-details/blob/main/ppo.py#L297).

Entropy loss and value loss I can handle just fine, but pg_loss depends on ratio of post-softmax probabilities, which comes from the policy network AND advantages, which depend on the value network. (Python impl: pg_loss = torch.max(-mb_advantages * ratio, -mb_advantages * torch.clamp(ratio, 1 - args.clip_coef, 1 + args.clip_coef)).mean() )

I have 2 separate networks for those, is there any way to have the autodiff work only regarding the Value network on one side, and only regarding the Policy network on the other side? I know that tensorflow can do that somehow, but I don't know about this lib.

Thanks.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examples or resources for autodiff with 2 networks? #880

Examples or resources for autodiff with 2 networks? #880

Cobrand commented Oct 30, 2023

Examples or resources for autodiff with 2 networks? #880

Examples or resources for autodiff with 2 networks? #880

Comments

Cobrand commented Oct 30, 2023