Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the mean and std in Reward Model is set as 0.16717362830052426 and 1.0333394966054072 ? #7

Closed
liming-ai opened this issue Apr 23, 2023 · 1 comment

Comments

@liming-ai
Copy link

liming-ai commented Apr 23, 2023

Hi, @tongyx361

Thanks for your contribution, I want to figure out why the mean and std in the Reward Model are set as the following values:

self.mean = 0.16717362830052426

In addition, there are negative reward values during inference, which confuses me that what's the range of rewards during training?

@xujz18
Copy link
Member

xujz18 commented Apr 24, 2023

The two values you mentioned are the mean/std of the reward values on the test set. It is to make the average reward value on the test set 0. The reward is normalized to have a mean of 0 and a standard deviation of 1. When testing stable diffusion v1.4 on our metric set, the scope of the reward is observed to have 62.4% of [-1,1] and 98.2% of [-2,2].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants