Rationale for KL decay #44

Rolv-Arild · 2024-05-05T20:47:25Z

Hi, just a small question about the choice of making the KL term decay in importance.

In the paper you describe that high values of the KL divergence coefficient limit the policy, while low values make it forget useful stuff from the BC model. Thus you set it to decay gradually.
I'm just wondering, can't this lead to having kind of the worst of both worlds, where it only hits a good range for a brief period? Did you try setting it to a constant in-between value instead of decaying?

brandonhoughton · 2024-05-06T23:09:59Z

The thinking is you get rotations at the beginning of training that encourage more general refinement later in training. Not sure about the ablation you proposed being run

Rolv-Arild · 2024-05-07T17:58:56Z

Did you try any other variations? Lower starting coefficient? Faster decay?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rationale for KL decay #44

Rationale for KL decay #44

Rolv-Arild commented May 5, 2024 •

edited

Loading

brandonhoughton commented May 6, 2024

Rolv-Arild commented May 7, 2024

Rationale for KL decay #44

Rationale for KL decay #44

Comments

Rolv-Arild commented May 5, 2024 • edited Loading

brandonhoughton commented May 6, 2024

Rolv-Arild commented May 7, 2024

Rolv-Arild commented May 5, 2024 •

edited

Loading