You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, just a small question about the choice of making the KL term decay in importance.
In the paper you describe that high values of the KL divergence coefficient limit the policy, while low values make it forget useful stuff from the BC model. Thus you set it to decay gradually.
I'm just wondering, can't this lead to having kind of the worst of both worlds, where it only hits a good range for a brief period? Did you try setting it to a constant in-between value instead of decaying?
The text was updated successfully, but these errors were encountered:
The thinking is you get rotations at the beginning of training that encourage more general refinement later in training. Not sure about the ablation you proposed being run
Hi, just a small question about the choice of making the KL term decay in importance.
In the paper you describe that high values of the KL divergence coefficient limit the policy, while low values make it forget useful stuff from the BC model. Thus you set it to decay gradually.
I'm just wondering, can't this lead to having kind of the worst of both worlds, where it only hits a good range for a brief period? Did you try setting it to a constant in-between value instead of decaying?
The text was updated successfully, but these errors were encountered: