You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
No. AlignProp makes reward finetuning very fast compared to DDPO (about 25x) becoz of backpropagating the gradients directly from the reward function.
Describe alternatives you've considered.
There is currently an implementation, however it is not well supported and is not within diffusers pipeline. https://github.com/mihirp1998/AlignProp/
The text was updated successfully, but these errors were encountered:
parthos86
changed the title
AlignProp Support for direct reward finetuning https://github.com/mihirp1998/AlignProp/
AlignProp Support for direct reward finetuning
Mar 14, 2024
Is your feature request related to a problem? Please describe.
No. AlignProp makes reward finetuning very fast compared to DDPO (about 25x) becoz of backpropagating the gradients directly from the reward function.
Describe the solution you'd like.
A similar integration to DDPO.
https://github.com/huggingface/diffusers/blob/main/docs/source/en/training/ddpo.md
Describe alternatives you've considered.
There is currently an implementation, however it is not well supported and is not within diffusers pipeline.
https://github.com/mihirp1998/AlignProp/
@mihirp1998
The text was updated successfully, but these errors were encountered: