Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationale for KL decay #44

Open
Rolv-Arild opened this issue May 5, 2024 · 2 comments
Open

Rationale for KL decay #44

Rolv-Arild opened this issue May 5, 2024 · 2 comments

Comments

@Rolv-Arild
Copy link

Rolv-Arild commented May 5, 2024

Hi, just a small question about the choice of making the KL term decay in importance.

In the paper you describe that high values of the KL divergence coefficient limit the policy, while low values make it forget useful stuff from the BC model. Thus you set it to decay gradually.
I'm just wondering, can't this lead to having kind of the worst of both worlds, where it only hits a good range for a brief period? Did you try setting it to a constant in-between value instead of decaying?

@brandonhoughton
Copy link
Collaborator

The thinking is you get rotations at the beginning of training that encourage more general refinement later in training. Not sure about the ablation you proposed being run

@Rolv-Arild
Copy link
Author

Did you try any other variations? Lower starting coefficient? Faster decay?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants