Gradient checkpointing throws use_reentrant warning on PyTorch 2.1 #28536

rosario-purple · 2024-01-16T16:14:29Z

System Info

transformers version: 4.36.2
Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
Python version: 3.10.13
Huggingface_hub version: 0.19.4
Safetensors version: 0.4.0
Accelerate version: 0.25.0
Accelerate config: - compute_environment: LOCAL_MACHINE
- distributed_type: DEEPSPEED
- mixed_precision: bf16
- use_cpu: False
- debug: False
- num_processes: 8
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- deepspeed_config: {'gradient_accumulation_steps': 1, 'offload_optimizer_device': 'none', 'offload_param_device': 'none', 'zero3_init_flag': True, 'zero3_save_16bit_model': False, 'zero_stage': 3}
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
PyTorch version (GPU?): 2.1.1+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): 0.7.5 (cpu)
Jax version: 0.4.21
JaxLib version: 0.4.21
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: Yes

Who can help?

@ArthurZucker @younesbelkada

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Training any text model with gradient checkpointing enabled on PyTorch 2.1 and higher produces this warning:

/scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: Warning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.

This can be resolved by manually monkey-patching the model code with use_reentrant=True, eg. like so:

                hidden_states, self_attns, decoder_cache = torch.utils.checkpoint.checkpoint(
                    create_custom_forward(decoder_layer),
                    hidden_states,
                    attention_mask,
                    position_ids,
                    None,
                    is_padded_inputs,
                    use_reentrant=True,
                )

This is caused by an upstream change in PyTorch:

https://medium.com/pytorch/how-activation-checkpointing-enables-scaling-up-training-deep-learning-models-7a93ae01ff2d

Expected behavior

No warning should be written

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-01-16T16:37:24Z

Thanks for raising! given that we had #27020, this should be fairly easy to fix! cc @younesbelkada

rosario-purple · 2024-02-16T21:57:02Z

@ArthurZucker is this still outstanding?

ArthurZucker · 2024-02-19T03:41:55Z

Will merge the PR today

lucasjinreal · 2024-04-09T02:54:40Z

Which version start this fixed? Am using 3.47.2 still get this error.

huangganggui · 2024-04-11T06:47:52Z

4.39.3 till get this warning.

huangganggui · 2024-04-11T07:08:10Z

4.39.3 till get this warning.

For my case, model.gradient_checkpointing_enable() fix it. maybe you can try @lucasjinreal

ankush13r · 2024-08-04T15:28:36Z

I'm using transformers==4.43.3, and still getting errors when trying to use the Trainer API with gradient_checkpointing=True.

BigDataMLexplorer · 2024-08-11T15:14:45Z

I'm using transformers==4.43.3, and still getting errors when trying to use the Trainer API with gradient_checkpointing=True.

Me too.. Try to use model.gradient_checkpointing_enable() and do not specify gradient_checkpointing=True in huggingface Trainer API. It solved my problem.

ArthurZucker · 2024-08-26T15:55:39Z

Could you all share which model you are using? 🤗

BigDataMLexplorer · 2024-08-26T21:01:43Z

Hi, I use Llama3 8b.

ankush13r · 2024-08-27T11:51:08Z

Hello, I'm using Llama-2-7b and Mistral-7B-v0.3. Both are giving same warning.

ArthurZucker · 2024-08-27T14:09:11Z

Are you using a recent version of transformers? By default we do pass this flag:

transformers/src/transformers/modeling_utils.py

Line 2345 in 834ec7b

def gradient_checkpointing_enable(self, gradient_checkpointing_kwargs=None):

so something like model.gradient_checkpointing_enable({"use_reentrant": False}). But by default we already pass this flag when gradien checkpointing is used.

ankush13r · 2024-08-27T16:16:51Z

Thanks it will solve my problem, Since I'm using trainer and I can pass this argument.
But I think the issue is here:
In trainer It assign the value to gradient_checkpointing_kwargs = {} https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L2122
So when it reaches to the gradient_checkpointing_enable

transformers/src/transformers/modeling_utils.py

Line 2362 in 834ec7b

if gradient_checkpointing_kwargs is None:

function it is not None, it has an empty dict.
The problem will be solved removing this if https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L2121

ArthurZucker · 2024-08-28T13:24:56Z

Yep good catch! Do you want to open a PR for this? 🤗

ankush13r · 2024-08-28T13:55:56Z

Of course, it would be a pleasure to collaborate.

ArthurZucker mentioned this issue Jan 16, 2024

[gradient_checkpointing] default to use it for torch 2.3 #28538

Merged

huggingface deleted a comment from github-actions bot Feb 16, 2024

ArthurZucker closed this as completed in #28538 Feb 20, 2024

ankush13r mentioned this issue Aug 30, 2024

Fix: Suppressed 'use_reentrant=False' warning #33208

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient checkpointing throws use_reentrant warning on PyTorch 2.1 #28536

Gradient checkpointing throws use_reentrant warning on PyTorch 2.1 #28536

rosario-purple commented Jan 16, 2024

ArthurZucker commented Jan 16, 2024

rosario-purple commented Feb 16, 2024

ArthurZucker commented Feb 19, 2024

lucasjinreal commented Apr 9, 2024

huangganggui commented Apr 11, 2024 •

edited

Loading

huangganggui commented Apr 11, 2024

ankush13r commented Aug 4, 2024

BigDataMLexplorer commented Aug 11, 2024 •

edited

Loading

ArthurZucker commented Aug 26, 2024

BigDataMLexplorer commented Aug 26, 2024

ankush13r commented Aug 27, 2024

ArthurZucker commented Aug 27, 2024

ankush13r commented Aug 27, 2024

ArthurZucker commented Aug 28, 2024

ankush13r commented Aug 28, 2024

Gradient checkpointing throws use_reentrant warning on PyTorch 2.1 #28536

Gradient checkpointing throws use_reentrant warning on PyTorch 2.1 #28536

Comments

rosario-purple commented Jan 16, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Jan 16, 2024

rosario-purple commented Feb 16, 2024

ArthurZucker commented Feb 19, 2024

lucasjinreal commented Apr 9, 2024

huangganggui commented Apr 11, 2024 • edited Loading

huangganggui commented Apr 11, 2024

ankush13r commented Aug 4, 2024

BigDataMLexplorer commented Aug 11, 2024 • edited Loading

ArthurZucker commented Aug 26, 2024

BigDataMLexplorer commented Aug 26, 2024

ankush13r commented Aug 27, 2024

ArthurZucker commented Aug 27, 2024

ankush13r commented Aug 27, 2024

ArthurZucker commented Aug 28, 2024

ankush13r commented Aug 28, 2024

huangganggui commented Apr 11, 2024 •

edited

Loading

BigDataMLexplorer commented Aug 11, 2024 •

edited

Loading