AlignProp Support for direct reward finetuning #7312

parthos86 · 2024-03-14T04:16:54Z

Is your feature request related to a problem? Please describe.
No. AlignProp makes reward finetuning very fast compared to DDPO (about 25x) becoz of backpropagating the gradients directly from the reward function.

Describe the solution you'd like.
A similar integration to DDPO.
https://github.com/huggingface/diffusers/blob/main/docs/source/en/training/ddpo.md

Describe alternatives you've considered.
There is currently an implementation, however it is not well supported and is not within diffusers pipeline.
https://github.com/mihirp1998/AlignProp/

@mihirp1998

sayakpaul · 2024-03-15T06:54:11Z

It's more of trl thing really. Cc: @lvwerra @younesbelkada. diffusers is not a training focused library.

mihirp1998 · 2024-03-15T23:01:54Z

I would be happy to do the the integration, although i would need some reference to what format/structure to follow

sayakpaul · 2024-03-16T01:08:30Z

I am going to close this issue as it better belongs to the trl repo.

parthos86 changed the title ~~AlignProp Support for direct reward finetuning https://github.com/mihirp1998/AlignProp/~~ AlignProp Support for direct reward finetuning Mar 14, 2024

sayakpaul closed this as completed Mar 16, 2024

mihirp1998 mentioned this issue Apr 25, 2024

Added Reward Backpropogation Support huggingface/trl#1585

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AlignProp Support for direct reward finetuning #7312

AlignProp Support for direct reward finetuning #7312

parthos86 commented Mar 14, 2024 •

edited

Loading

sayakpaul commented Mar 15, 2024

mihirp1998 commented Mar 15, 2024

sayakpaul commented Mar 16, 2024

AlignProp Support for direct reward finetuning #7312

AlignProp Support for direct reward finetuning #7312

Comments

parthos86 commented Mar 14, 2024 • edited Loading

sayakpaul commented Mar 15, 2024

mihirp1998 commented Mar 15, 2024

sayakpaul commented Mar 16, 2024

parthos86 commented Mar 14, 2024 •

edited

Loading