add online-rpo-bwd-kl #1

shengyangs · 2024-10-17T14:52:57Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Please update the CHANGELOG.md under next version with high level changes in this PR.

# Add a code snippet demonstrating how to use this

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation? Make sure to also update the NeMo Framework User Guide which contains the tutorials

shengyangs · 2024-10-17T14:53:25Z

add online-rpo-bwd-kl

8f51112

github-actions bot added Algorithms Utils labels Oct 17, 2024

abukharin3 approved these changes Oct 17, 2024

View reviewed changes

abukharin3 merged commit df96cd5 into abukharin3:pref-study-trtllm Oct 17, 2024
1 check passed