You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm curious is it possible to use RLHF feedback after a response to make small incremental adjustments in a tuning process? For example, if the user decides to fine-tune after an incorrect answer, can the model spend 60 seconds in the fine-tuning phase, save a checkpoint to disk, and then move on to the next question?
The text was updated successfully, but these errors were encountered:
I believe llama.ccp is only for inference, not training. Check out chatllama, but you will likely need some high-end GPUs to do RLHF. Alternatively, look at accelerate trl for performing RHLF on models that fit on consumer GPUs.
@gjmulder I would like to continue narrative to run such processes on CPU only. Even if it super slow I think it's possible to spend some time budget (60 secs) to improve a bit weights and close the loop of self improvement like
in Gödel machines.
Hey!
Thank you for your amazing job!
I'm curious is it possible to use RLHF feedback after a response to make small incremental adjustments in a tuning process? For example, if the user decides to fine-tune after an incorrect answer, can the model spend 60 seconds in the fine-tuning phase, save a checkpoint to disk, and then move on to the next question?
The text was updated successfully, but these errors were encountered: