GRPO tests failing in multi-device setting on main #2774

tyler-romero · 2025-02-05T17:09:08Z

Reproduction

trl on main 
~> pytest tests/test_grpo_trainer.py
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.12.5, pytest-8.3.4, pluggy-1.5.0
rootdir: /home/tromero/workspace/trl
configfile: pyproject.toml
plugins: anyio-4.8.0, xdist-3.6.1, rerunfailures-15.0, cov-6.0.0
collected 14 items                                                                                                                                                                                                

tests/test_grpo_trainer.py .FFFFFFFFFF...

A representative output:

______________________________________________________________________________ GRPOTrainerTester.test_training_reward_func_standard _______________________________________________________________________________

self = <tests.test_grpo_trainer.GRPOTrainerTester testMethod=test_training_reward_func_standard>

    def test_training_reward_func_standard(self):
        # Test if trainer can handle reward function with standard format
        dataset = load_dataset("trl-internal-testing/zen", "standard_prompt_only", split="train")
    
        def reward_func(completions, **kwargs):
            """Reward function that rewards longer completions."""
            return [float(len(completion)) for completion in completions]
    
        with tempfile.TemporaryDirectory() as tmp_dir:
            training_args = GRPOConfig(
                output_dir=tmp_dir,
                learning_rate=0.1,  # increase the learning rate to speed up the test
                per_device_train_batch_size=2,  # reduce the batch size to reduce memory usage
                num_generations=3,  # reduce the number of generations to reduce memory usage
                max_completion_length=32,  # reduce the completion length to reduce memory usage
                report_to="none",
            )
            trainer = GRPOTrainer(
                model="trl-internal-testing/tiny-Qwen2ForCausalLM-2.5",
                reward_funcs=reward_func,
                args=training_args,
                train_dataset=dataset,
            )
    
            previous_trainable_params = {n: param.clone() for n, param in trainer.model.named_parameters()}
    
            trainer.train()
    
            self.assertIsNotNone(trainer.state.log_history[-1]["train_loss"])
    
            # Check the params have changed
            for n, param in previous_trainable_params.items():
                new_param = trainer.model.get_parameter(n)
>               self.assertFalse(torch.equal(param, new_param), f"Parameter {n} has not changed.")
E               AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.

============================================================================================= short test summary info =============================================================================================
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_0_standard_prompt_only - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_1_conversational_prompt_only - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_different_reward_model - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_multiple_mixed_reward_funcs - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_multiple_reward_funcs - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_peft - AssertionError: True is not false : Parameter base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_reward_func_additional_column - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_reward_func_conversational - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_reward_func_standard - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_torch_compile - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_vllm - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_with_sync_ref_model - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
============================================================================== 12 failed, 2 passed, 82 warnings in 163.33s (0:02:43) ==============================================================================

System Info

~> trl env
[2025-02-05 09:08:41,976] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
INFO 02-05 09:08:43 __init__.py:183] Automatically detected platform cuda.

Copy-paste the following information when reporting an issue:

- Platform: Linux-6.8.0-52-generic-x86_64-with-glibc2.35
- Python version: 3.12.5
- PyTorch version: 2.5.1
- CUDA device(s): NVIDIA GeForce RTX 4090, NVIDIA GeForce RTX 4090
- Transformers version: 4.48.2
- Accelerate version: 1.3.0
- Accelerate config: not found
- Datasets version: 3.2.0
- HF Hub version: 0.28.1
- TRL version: 0.15.0.dev0
- bitsandbytes version: 0.45.1
- DeepSpeed version: 0.16.3
- Diffusers version: 0.32.2
- Liger-Kernel version: 0.5.2
- LLM-Blender version: 0.0.2
- OpenAI version: 1.61.1
- PEFT version: 0.14.0

Checklist

I have checked that my issue isn't already filed (see open issues)
I have included my system information
Any code provided is minimal, complete, and reproducible (more on MREs)
Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
Any traceback provided is complete

The text was updated successfully, but these errors were encountered:

tyler-romero · 2025-02-05T18:21:58Z

Backtracking to find a commit that works:
Passes: 1123bd0
Fails: ed14ed9

tyler-romero · 2025-02-05T18:25:28Z

Ah, this passes so the issue is related to using multiple GPUs:

CUDA_VISIBLE_DEVICES="0" pytest tests/test_grpo_trainer.py -k test_training_peft

github-actions bot added 🏋 GRPO Related to GRPO 🐛 bug Something isn't working labels Feb 5, 2025

tyler-romero changed the title ~~GRPO tests failing on main~~ GRPO tests failing in multi-device setting on main Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRPO tests failing in multi-device setting on main #2774

GRPO tests failing in multi-device setting on main #2774

tyler-romero commented Feb 5, 2025

tyler-romero commented Feb 5, 2025

tyler-romero commented Feb 5, 2025

GRPO tests failing in multi-device setting on main #2774

GRPO tests failing in multi-device setting on main #2774

Comments

tyler-romero commented Feb 5, 2025

Reproduction

System Info

Checklist

tyler-romero commented Feb 5, 2025

tyler-romero commented Feb 5, 2025