使用lora微调方法时显示RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #839

ghost · 2024-02-16T03:57:31Z

ghost
Feb 16, 2024

我想用对话格式数据在阿里云PAI平台上微调chatglm3-6b模型，我按照官方步骤一步步进行，前面很顺利，直到我输入python finetune_hf.py data/fix /mnt/workspace/chatglm3-6b [configs/lora.yaml，程序在检查完checkpoint并准备微调时报错。希望能有大佬帮帮我，谢谢！附上报错日志。
Num examples = 1
Num Epochs = 1,000
Instantaneous batch size per device = 1
Total train batch size (w. parallel, distributed & accumulation) = 1
Gradient Accumulation steps = 1
Total optimization steps = 1,000
Number of trainable parameters = 1,949,696
0%| | 0/1000 [00:00<?, ?it/s]/opt/conda/envs/ChatGLM3-6b-finetunning/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/opt/conda/envs/ChatGLM3-6b-finetunning/lib/python3.10/site-packages/torch/utils/checkpoint.py:90: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
Traceback (most recent call last):

File "/mnt/workspace/ChatGLM3/finetune_demo/finetune_hf.py", line 525, in
app()

File "/mnt/workspace/ChatGLM3/finetune_demo/finetune_hf.py", line 517, in main
trainer.train()

File "/opt/conda/envs/ChatGLM3-6b-finetunning/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(

File "/opt/conda/envs/ChatGLM3-6b-finetunning/lib/python3.10/site-packages/transformers/trainer.py", line 1869, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)

File "/opt/conda/envs/ChatGLM3-6b-finetunning/lib/python3.10/site-packages/transformers/trainer.py", line 2781, in training_step
self.accelerator.backward(loss)

File "/opt/conda/envs/ChatGLM3-6b-finetunning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1966, in backward
loss.backward(**kwargs)

File "/opt/conda/envs/ChatGLM3-6b-finetunning/lib/python3.10/site-packages/torch/_tensor.py", line 522, in backward
torch.autograd.backward(

File "/opt/conda/envs/ChatGLM3-6b-finetunning/lib/python3.10/site-packages/torch/autograd/init.py", line 266, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

0%| | 0/1000 [00:02<?, ?it/s]

baifachuan · 2024-02-16T11:06:57Z

baifachuan
Feb 16, 2024

我遇到一样的问题，请问下这个问题是怎么解决的呢？

0 replies

longmans · 2024-02-19T07:33:08Z

longmans
Feb 19, 2024

Add this for enable input grads.

@baifachuan

1 reply

netwolf712 Feb 19, 2024

亲测可用，感谢！

在
model.gradient_checkpointing_enable()
下面增加
model.enable_input_require_grads()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用lora微调方法时显示RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #839

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

使用lora微调方法时显示RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #839

ghost Feb 16, 2024

Replies: 2 comments · 1 reply

baifachuan Feb 16, 2024

longmans Feb 19, 2024

netwolf712 Feb 19, 2024

ghost
Feb 16, 2024

Replies: 2 comments 1 reply

baifachuan
Feb 16, 2024

longmans
Feb 19, 2024