Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: pop from an empty deque in run_r1_grpo.py #76

Open
JunMa11 opened this issue Jan 31, 2025 · 3 comments
Open

IndexError: pop from an empty deque in run_r1_grpo.py #76

JunMa11 opened this issue Jan 31, 2025 · 3 comments

Comments

@JunMa11
Copy link

JunMa11 commented Jan 31, 2025

Dear @philschmid ,

Thank you so much for sharing the great tutorial on Mini-R1.

I followed the tutorial to run the code. My workstation has three A6000 GPUs and I set num_processes=7

I got the following error. Any comments are highly appreciated.

2025-01-31 10:18:34,579 - __main__ - INFO - *** Starting training 2025-01-31 10:18:34 for 3.0 epochs***
INFO:__main__:*** Starting training 2025-01-31 10:18:34 for 3.0 epochs***
Parameter Offload: Total persistent parameters: 241664 in 181 params
  0%|| 1/450 [00:12<1:37:12, 12.99s/it][rank1]: Traceback (most recent call last):
[rank1]:   File "/home/jma/Documents/prototype/run_r1_grpo.py", line 273, in <module>
[rank1]:     main()
[rank1]:   File "/home/jma/Documents/prototype/run_r1_grpo.py", line 269, in main
[rank1]:     grpo_function(model_args, script_args, training_args)
[rank1]:   File "/home/jma/Documents/prototype/run_r1_grpo.py", line 230, in grpo_function
[rank1]:     train_result = trainer.train(resume_from_checkpoint=last_checkpoint)
[rank1]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 2171, in train
[rank1]:     return inner_training_loop(
[rank1]:            ^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
[rank1]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank1]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 3675, in training_step
[rank1]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 444, in compute_loss
[rank1]:     per_token_logps = get_per_token_logps(model, prompt_completion_ids, num_logits_to_keep)
[rank1]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 432, in get_per_token_logps
[rank1]:     logits = model(input_ids, num_logits_to_keep=num_logits_to_keep + 1).logits  # (B, L, V)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank1]:     ret_val = func(*args, **kwargs)
[rank1]:               ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank1]:     loss = self.module(*inputs, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank1]:     return inner()
[rank1]:            ^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in inner
[rank1]:     args_result = hook(self, args)
[rank1]:                   ^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank1]:     ret_val = func(*args, **kwargs)
[rank1]:               ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 228, in _start_of_forward_hook
[rank1]:     self.get_param_coordinator().reset_step()
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
[rank1]:     return fn(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 232, in reset_step
[rank1]:     self.construct_parameter_trace_from_module_trace()
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 216, in construct_parameter_trace_from_module_trace
[rank1]:     self.record_parameters(sub_module)
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 208, in record_parameters
[rank1]:     step_id = self.__step_id_module_fetched_for[sub_module.id].popleft()
[rank1]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: IndexError: pop from an empty deque
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/jma/Documents/prototype/run_r1_grpo.py", line 273, in <module>
[rank0]:     main()
[rank0]:   File "/home/jma/Documents/prototype/run_r1_grpo.py", line 269, in main
[rank0]:     grpo_function(model_args, script_args, training_args)
[rank0]:   File "/home/jma/Documents/prototype/run_r1_grpo.py", line 230, in grpo_function
[rank0]:     train_result = trainer.train(resume_from_checkpoint=last_checkpoint)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 2171, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
[rank0]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 3675, in training_step
[rank0]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 444, in compute_loss
[rank0]:     per_token_logps = get_per_token_logps(model, prompt_completion_ids, num_logits_to_keep)
[rank0]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 432, in get_per_token_logps
[rank0]:     logits = model(input_ids, num_logits_to_keep=num_logits_to_keep + 1).logits  # (B, L, V)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]:     ret_val = func(*args, **kwargs)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank0]:     loss = self.module(*inputs, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:            ^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in inner
[rank0]:     args_result = hook(self, args)
[rank0]:                   ^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]:     ret_val = func(*args, **kwargs)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 228, in _start_of_forward_hook
[rank0]:     self.get_param_coordinator().reset_step()
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 232, in reset_step
[rank0]:     self.construct_parameter_trace_from_module_trace()
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 216, in construct_parameter_trace_from_module_trace
[rank0]:     self.record_parameters(sub_module)
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 208, in record_parameters
[rank0]:     step_id = self.__step_id_module_fetched_for[sub_module.id].popleft()
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: IndexError: pop from an empty deque
  0%|| 1/450 [00:54<6:48:11, 54.55s/it]
W0131 10:19:35.719000 192129 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 192383 closing signal SIGTERM
E0131 10:19:36.439000 192129 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 192384) of binary: /home/jma/anaconda3/envs/r1demo/bin/python
Traceback (most recent call last):
  File "/home/jma/anaconda3/envs/r1demo/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1157, in launch_command
    deepspeed_launcher(args)
  File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/accelerate/commands/launch.py", line 845, in deepspeed_launcher
    distrib_run.run(args)
  File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/distributed/run.py", line 910, in run
    elastic_launch(
  File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
run_r1_grpo.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>

Here are the configs:

# Model arguments
model_name_or_path: Qwen2.5-3B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: flash_attention_2
bf16: true
tf32: true
output_dir: log/qwen-2.5-3b-r1-countdown

# Dataset arguments
dataset_id_or_path: Countdown-Tasks-3to4

# Lora Arguments
# No LoRA is used here

# Training arguments
max_steps: 450
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
learning_rate: 5.0e-7 # 1.0e-6 as in the deepseek math paper 5-e7 from https://hijkzzz.notion.site/unraveling-rlhf-and-its-variants-engineering-insights#147d9a33ecc9806090f3d5c749d31f05
lr_scheduler_type: cosine
warmup_ratio: 0.03
# GRPO specific parameters
beta: 0.001 # 0.04 as in the deepseek math paper 0.001 from https://hijkzzz.notion.site/unraveling-rlhf-and-its-variants-engineering-insights#147d9a33ecc9806090f3d5c749d31f05
max_prompt_length: 256
max_completion_length: 1024
num_generations: 8
use_vllm: true
vllm_device: "cuda:2"
vllm_gpu_memory_utilization: 0.5

# Logging arguments
logging_strategy: steps
logging_steps: 2
report_to:
- tensorboard
save_strategy: "steps"
save_steps: 25
seed: 42

@philschmid
Copy link
Owner

I followed the tutorial to run the code. My workstation has three A6000 GPUs and I set num_processes=7

num_procceses needs to be set to the number of GPUs available - 1. As 1 will be used for inference with vLLM to generate samples. So it should be 2 in your case but i am not sure if that's enough memory.

@zlh1992
Copy link

zlh1992 commented Feb 10, 2025

how to set lora arguments when I have multi GPUs?

@philschmid
Copy link
Owner

Lora is currently not support on multi-gpus with vLLM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants