How to liberate the gpt2 from reference model? #14

yananchen1989 · 2021-08-03T10:46:11Z

Hi,

We know that KL is used in the loss as a constraint for the difference between the original gpt2 and the active gpt2 which produces responses for rewards feedbacks.
How can I can tune the parameters to mitigate this constraint? I mean I want the active gpt2 can deviate much from the original reference gpt2, as I find in my experiments that the rewards do not improve as expected, possibly due to this constraint.
I am new to PPO. Hoping for some suggestions.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to liberate the gpt2 from reference model? #14

How to liberate the gpt2 from reference model? #14

yananchen1989 commented Aug 3, 2021

How to liberate the gpt2 from reference model? #14

How to liberate the gpt2 from reference model? #14

Comments

yananchen1989 commented Aug 3, 2021