v1.2.0 / 2023-08-28

Latest

Latest

p-ferreira released this 28 Aug 16:30

· 3 commits to main since this release

What's changed

Adds Direct Optimization (DPO) style rewards by @opentaco on #99
Changes print format on exception catch by @camfairchild on #135
Brings back netuid and wandb to logged config by @p-ferreira on #137
Adds DPO penalty update by @Eugene-hu on #138
Adds original reward output to wandb logs by @isabella618033 on #139
Reweights reward models by @Eugene-hu on #140
Update stale documentation by @steffencruz on #129

Contributors

steffencruz, camfairchild, and 4 other contributors

Assets 2