You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I had a few questions about the results on the anthropic_hh dataset:
It seems like all models you test don't perform better than chance. Is this the correct interpretation?
The prompting you use is to show the model one of the two options and then have it predict if this was the better option. Naively, this seems much, much harder than showing the model both options and having it pick which of the comparison pair is better. Is there a reason you don't use a prompting format which does this?
The text was updated successfully, but these errors were encountered:
Hello, I had a few questions about the results on the anthropic_hh dataset:
The text was updated successfully, but these errors were encountered: