Skip to content

Commit

Permalink
Fully remove RLHF in favor of DPO (#747)
Browse files Browse the repository at this point in the history
* rm RLHF tooltips

* rm rlhf ds, model, cfg

* rm deprecation warning

* Add note to Readme

* rm reward model
  • Loading branch information
pascal-pfeiffer authored Jun 5, 2024
1 parent 5e4471b commit 9cafe8c
Show file tree
Hide file tree
Showing 24 changed files with 1 addition and 759 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ Using CLI for fine-tuning LLMs:

## What's New

- [PR 747](https://github.com/h2oai/h2o-llmstudio/pull/747) Fully removed RLHF in favor of DPO/IPO/KTO optimization.
- [PR 592](https://github.com/h2oai/h2o-llmstudio/pull/599) Added `KTOPairLoss` for DPO modeling allowing to train models with simple preference data. Data currently needs to be manually prepared by randomly matching positive and negative examples as pairs.
- [PR 592](https://github.com/h2oai/h2o-llmstudio/pull/592) Starting to deprecate RLHF in favor of DPO/IPO optimization. Training is disabled, but old experiments are still viewable. RLHF will be fully removed in a future release.
- [PR 530](https://github.com/h2oai/h2o-llmstudio/pull/530) Introduced a new problem type for DPO/IPO optimization. This optimization technique can be used as an alternative to RLHF.
Expand Down
79 changes: 0 additions & 79 deletions documentation/docs/guide/experiments/experiment-settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ import DSanswerColumn from '../../tooltips/experiments/_answer-column.mdx';
import DSparentIdColumn from '../../tooltips/experiments/_parent-id-column.mdx';
import DStextPromptStart from '../../tooltips/experiments/_text-prompt-start.mdx';
import DStextAnswerSeparator from '../../tooltips/experiments/_text-answer-separator.mdx';
import DSadaptiveKlControl from '../../tooltips/experiments/_adaptive-kl-control.mdx';
import DSaddEosTokentoprompt from '../../tooltips/experiments/_add-eos-token-to-prompt.mdx';
import DSaddEosTokentoanswer from '../../tooltips/experiments/_add-eos-token-to-answer.mdx';
import DSmaskPromptlabels from '../../tooltips/experiments/_mask-prompt-labels.mdx';
Expand Down Expand Up @@ -53,20 +52,6 @@ import TSsavecheckpoint from '../../tooltips/experiments/_save-checkpoint.mdx';
import TSevaluationepochs from '../../tooltips/experiments/_evaluation-epochs.mdx';
import TSevaluationbeforetraining from '../../tooltips/experiments/_evaluate-before-training.mdx';
import TStrainvalidationdata from '../../tooltips/experiments/_train-validation-data.mdx';
import TSuseRHLF from '../../tooltips/experiments/_use-rlhf.mdx';
import TSrewardModel from '../../tooltips/experiments/_reward-model.mdx';
import TSinitialKlCoefficient from '../../tooltips/experiments/_initial-kl-coefficient.mdx';
import TSklTarget from '../../tooltips/experiments/_kl-target.mdx';
import TSklHorizon from '../../tooltips/experiments/_kl-horizon.mdx';
import TSadvantagesGamma from '../../tooltips/experiments/_advantages-gamma.mdx';
import TSadvantagesLambda from '../../tooltips/experiments/_advantages-lambda.mdx';
import TSppoClipPolicy from '../../tooltips/experiments/_ppo-clip-policy.mdx';
import TSppoClipValue from '../../tooltips/experiments/_ppo-clip-value.mdx';
import TSscalingFactorValueLoss from '../../tooltips/experiments/_scaling-factor-value-loss.mdx';
import TSppoEpochs from '../../tooltips/experiments/_ppo-epochs.mdx';
import TSppoBatchSize from '../../tooltips/experiments/_ppo-batch-size.mdx';
import TSppoGenerateTemp from '../../tooltips/experiments/_ppo-generate-temperature.mdx';
import TSoffloadRewardModel from '../../tooltips/experiments/_offload-reward-model.mdx';
import AStokenmaskprobability from '../../tooltips/experiments/_token-mask-probability.mdx';
import ASskipParentprobability from '../../tooltips/experiments/_skip-parent-probability.mdx';
import ASrandomparentprobability from '../../tooltips/experiments/_random-parent-probability.mdx';
Expand Down Expand Up @@ -174,10 +159,6 @@ The settings under each category are listed and described below.

<DStextAnswerSeparator/>

## Adaptive Kl control

<DSadaptiveKlControl/>

### Add EOS token to prompt

<DSaddEosTokentoprompt/>
Expand Down Expand Up @@ -328,66 +309,6 @@ The settings under each category are listed and described below.

<TStrainvalidationdata/>

### Use RLHF

<TSuseRHLF/>

### Reward model

<TSrewardModel/>

### Adaptive KL control

<DSadaptiveKlControl/>

### Initial KL coefficient

<TSinitialKlCoefficient/>

### KL target

<TSklTarget/>

### KL Horizon

<TSklHorizon/>

### Advantages gamma

<TSadvantagesGamma/>

### Advantages Lambda

<TSadvantagesLambda/>

### PPO clip policy

<TSppoClipPolicy/>

### PPO clip value

<TSppoClipValue/>

### Scaling factor value loss

<TSscalingFactorValueLoss/>

### PPO epochs

<TSppoEpochs/>

### PPO Batch Size

<TSppoBatchSize/>

### PPO generate temperature

<TSppoGenerateTemp/>

### Offload reward model

<TSoffloadRewardModel/>

## Augmentation settings

### Token mask probability
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

1 change: 0 additions & 1 deletion documentation/docs/tooltips/experiments/_kl-horizon.mdx

This file was deleted.

1 change: 0 additions & 1 deletion documentation/docs/tooltips/experiments/_kl-target.mdx

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

1 change: 0 additions & 1 deletion documentation/docs/tooltips/experiments/_ppo-epochs.mdx

This file was deleted.

This file was deleted.

3 changes: 0 additions & 3 deletions documentation/docs/tooltips/experiments/_reward-model.mdx

This file was deleted.

1 change: 0 additions & 1 deletion documentation/docs/tooltips/experiments/_rollout_steps.mdx

This file was deleted.

This file was deleted.

1 change: 0 additions & 1 deletion documentation/docs/tooltips/experiments/_use-rlhf.mdx

This file was deleted.

Loading

0 comments on commit 9cafe8c

Please sign in to comment.