You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The two values you mentioned are the mean/std of the reward values on the test set. It is to make the average reward value on the test set 0. The reward is normalized to have a mean of 0 and a standard deviation of 1. When testing stable diffusion v1.4 on our metric set, the scope of the reward is observed to have 62.4% of [-1,1] and 98.2% of [-2,2].
Hi, @tongyx361
Thanks for your contribution, I want to figure out why the mean and std in the Reward Model are set as the following values:
ImageReward/ImageReward/ImageReward.py
Line 80 in c0b9080
In addition, there are negative reward values during inference, which confuses me that what's the range of rewards during training?
The text was updated successfully, but these errors were encountered: