Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[StackLLaMA] Problems running reward_modeling.py using gpt2 as base for reward model #356

Closed
samuelhoglund opened this issue May 10, 2023 · 4 comments

Comments

@samuelhoglund
Copy link

Hello!

I am trying to get the reward_modeling.py file to work on a smaller scale by using gpt2 as a reward model.

The only changes I made to the file from its current version in the repo was to make the subsets for the data smaller, setting these values instead:

train_subset: Optional[int] = field(
        default=1000,
        metadata={"help": "The size of the subset of the training data to use"},
    )
    eval_subset: Optional[int] = field(
        default=500,
        metadata={"help": "The size of the subset of the eval data to use"},
    )

(Otherwise these are set to 100K and 50K, respectively.)

As well as retrieving a modified, smaller sample of the stack-exchange dataset consisting of one file instead of 12 or 20:

train_dataset = load_dataset("samhog/stack-exchange-mini", data_dir="data/reward", split="train[:1%]")
eval_dataset = load_dataset("samhog/stack-exchange-mini", data_dir="data/evaluation", split="train[:1%]")

However, when running the script, training fails. This is the error with the whole traceback included:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/trl/examples/stack_llama/scripts/reward_modeling.py:285 in <module> │
│                                                                              │
│   282 │   data_collator=RewardDataCollatorWithPadding(tokenizer=tokenizer, m │
│   283 )                                                                      │
│   284                                                                        │
│ ❱ 285 trainer.train(script_args.resume_from_checkpoint)                      │
│   286                                                                        │
│   287 print("Saving last checkpoint of the model")                           │
│   288 model.save_pretrained(output_name + "_peft_last_checkpoint")           │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1662 in      │
│ train                                                                        │
│                                                                              │
│   1659 │   │   inner_training_loop = find_executable_batch_size(             │
│   1660 │   │   │   self._inner_training_loop, self._train_batch_size, args.a │
│   1661 │   │   )                                                             │
│ ❱ 1662 │   │   return inner_training_loop(                                   │
│   1663 │   │   │   args=args,                                                │
│   1664 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1665 │   │   │   trial=trial,                                              │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1929 in      │
│ _inner_training_loop                                                         │
│                                                                              │
│   1926 │   │   │   │   │   with model.no_sync():                             │
│   1927 │   │   │   │   │   │   tr_loss_step = self.training_step(model, inpu │
│   1928 │   │   │   │   else:                                                 │
│ ❱ 1929 │   │   │   │   │   tr_loss_step = self.training_step(model, inputs)  │
│   1930 │   │   │   │                                                         │
│   1931 │   │   │   │   if (                                                  │
│   1932 │   │   │   │   │   args.logging_nan_inf_filter                       │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:2717 in      │
│ training_step                                                                │
│                                                                              │
│   2714 │   │   │   # loss gets scaled under gradient_accumulation_steps in d │
│   2715 │   │   │   loss = self.deepspeed.backward(loss)                      │
│   2716 │   │   else:                                                         │
│ ❱ 2717 │   │   │   loss.backward()                                           │
│   2718 │   │                                                                 │
│   2719 │   │   return loss.detach()                                          │
│   2720                                                                       │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/torch/_tensor.py:487 in backward     │
│                                                                              │
│    484 │   │   │   │   create_graph=create_graph,                            │
│    485 │   │   │   │   inputs=inputs,                                        │
│    486 │   │   │   )                                                         │
│ ❱  487 │   │   torch.autograd.backward(                                      │
│    488 │   │   │   self, gradient, retain_graph, create_graph, inputs=inputs │
│    489 │   │   )                                                             │
│    490                                                                       │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py:200 in    │
│ backward                                                                     │
│                                                                              │
│   197 │   # The reason we repeat same the comment below is that              │
│   198 │   # some Python versions print out the first line of a multi-line fu │
│   199 │   # calls in the traceback and some print out the last line          │
│ ❱ 200 │   Variable._execution_engine.run_backward(  # Calls into the C++ eng │
│   201 │   │   tensors, grad_tensors_, retain_graph, create_graph, inputs,    │
│   202 │   │   allow_unreachable=True, accumulate_grad=True)  # Calls into th │
│   203                                                                        │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: one of the variables needed for gradient computation has been 
modified by an inplace operation: [CUDABoolType [1, 1, 264, 264]] is at version 
3; expected version 2 instead. Hint: the backtrace further above shows the 
operation that failed to compute its gradient. The variable in question was 
changed in there or anywhere later. Good luck!

Does anyone have any tips on how to proceed?

Thanks in advance!

@mnoukhov
Copy link
Contributor

Could be related to the comment in https://github.com/lvwerra/trl/blob/main/examples/stack_llama/scripts/rl_training.py#L43

Have you tried GPT-Neo models?

@dayL-W
Copy link

dayL-W commented May 16, 2023

same error

@oliu-io
Copy link

oliu-io commented May 25, 2023

Here's a potential workaround #274 (comment)

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants