Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The process of RLHF and reward modeling #55

Open
joshhu opened this issue Mar 14, 2024 · 1 comment
Open

The process of RLHF and reward modeling #55

joshhu opened this issue Mar 14, 2024 · 1 comment

Comments

@joshhu
Copy link

joshhu commented Mar 14, 2024

這個模型是從llama2 SFT出來的話,看llama2的論文似乎llama2並沒有經過RLHF(llama2-chat有),請問Taiwan llama2有經過RLHF的訓練嗎?如果沒有的話,有關繁體中文的對齊,可以使用RLHF來進行,而非SFT。至於comparison的資料集,可以考慮用ChatGPT來產生,這樣不知有沒有試過,謝謝
image

@adamlin120
Copy link
Collaborator

adamlin120 commented May 16, 2024

好問題,我們看到產學界缺乏評分資料所以建立了 TW Chatbot Arena,目前收集到了上千的對比資料

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants