The process of RLHF and reward modeling #55

joshhu · 2024-03-14T09:37:56Z

這個模型是從llama2 SFT出來的話，看llama2的論文似乎llama2並沒有經過RLHF(llama2-chat有)，請問Taiwan llama2有經過RLHF的訓練嗎？如果沒有的話，有關繁體中文的對齊，可以使用RLHF來進行，而非SFT。至於comparison的資料集，可以考慮用ChatGPT來產生，這樣不知有沒有試過，謝謝

adamlin120 · 2024-05-16T02:13:52Z

好問題，我們看到產學界缺乏評分資料所以建立了 TW Chatbot Arena，目前收集到了上千的對比資料。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The process of RLHF and reward modeling #55

The process of RLHF and reward modeling #55

joshhu commented Mar 14, 2024 •

edited

Loading

adamlin120 commented May 16, 2024 •

edited

Loading

The process of RLHF and reward modeling #55

The process of RLHF and reward modeling #55

Comments

joshhu commented Mar 14, 2024 • edited Loading

adamlin120 commented May 16, 2024 • edited Loading

joshhu commented Mar 14, 2024 •

edited

Loading

adamlin120 commented May 16, 2024 •

edited

Loading