Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPO tests failing in multi-device setting on main #2774

Open
5 tasks done
tyler-romero opened this issue Feb 5, 2025 · 2 comments
Open
5 tasks done

GRPO tests failing in multi-device setting on main #2774

tyler-romero opened this issue Feb 5, 2025 · 2 comments
Labels
🐛 bug Something isn't working 🏋 GRPO Related to GRPO

Comments

@tyler-romero
Copy link
Contributor

Reproduction

trl on main 
~> pytest tests/test_grpo_trainer.py
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.12.5, pytest-8.3.4, pluggy-1.5.0
rootdir: /home/tromero/workspace/trl
configfile: pyproject.toml
plugins: anyio-4.8.0, xdist-3.6.1, rerunfailures-15.0, cov-6.0.0
collected 14 items                                                                                                                                                                                                

tests/test_grpo_trainer.py .FFFFFFFFFF...

A representative output:

______________________________________________________________________________ GRPOTrainerTester.test_training_reward_func_standard _______________________________________________________________________________

self = <tests.test_grpo_trainer.GRPOTrainerTester testMethod=test_training_reward_func_standard>

    def test_training_reward_func_standard(self):
        # Test if trainer can handle reward function with standard format
        dataset = load_dataset("trl-internal-testing/zen", "standard_prompt_only", split="train")
    
        def reward_func(completions, **kwargs):
            """Reward function that rewards longer completions."""
            return [float(len(completion)) for completion in completions]
    
        with tempfile.TemporaryDirectory() as tmp_dir:
            training_args = GRPOConfig(
                output_dir=tmp_dir,
                learning_rate=0.1,  # increase the learning rate to speed up the test
                per_device_train_batch_size=2,  # reduce the batch size to reduce memory usage
                num_generations=3,  # reduce the number of generations to reduce memory usage
                max_completion_length=32,  # reduce the completion length to reduce memory usage
                report_to="none",
            )
            trainer = GRPOTrainer(
                model="trl-internal-testing/tiny-Qwen2ForCausalLM-2.5",
                reward_funcs=reward_func,
                args=training_args,
                train_dataset=dataset,
            )
    
            previous_trainable_params = {n: param.clone() for n, param in trainer.model.named_parameters()}
    
            trainer.train()
    
            self.assertIsNotNone(trainer.state.log_history[-1]["train_loss"])
    
            # Check the params have changed
            for n, param in previous_trainable_params.items():
                new_param = trainer.model.get_parameter(n)
>               self.assertFalse(torch.equal(param, new_param), f"Parameter {n} has not changed.")
E               AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
============================================================================================= short test summary info =============================================================================================
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_0_standard_prompt_only - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_1_conversational_prompt_only - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_different_reward_model - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_multiple_mixed_reward_funcs - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_multiple_reward_funcs - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_peft - AssertionError: True is not false : Parameter base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_reward_func_additional_column - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_reward_func_conversational - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_reward_func_standard - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_torch_compile - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_vllm - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_with_sync_ref_model - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
============================================================================== 12 failed, 2 passed, 82 warnings in 163.33s (0:02:43) ==============================================================================

System Info

~> trl env
[2025-02-05 09:08:41,976] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
INFO 02-05 09:08:43 __init__.py:183] Automatically detected platform cuda.

Copy-paste the following information when reporting an issue:

- Platform: Linux-6.8.0-52-generic-x86_64-with-glibc2.35
- Python version: 3.12.5
- PyTorch version: 2.5.1
- CUDA device(s): NVIDIA GeForce RTX 4090, NVIDIA GeForce RTX 4090
- Transformers version: 4.48.2
- Accelerate version: 1.3.0
- Accelerate config: not found
- Datasets version: 3.2.0
- HF Hub version: 0.28.1
- TRL version: 0.15.0.dev0
- bitsandbytes version: 0.45.1
- DeepSpeed version: 0.16.3
- Diffusers version: 0.32.2
- Liger-Kernel version: 0.5.2
- LLM-Blender version: 0.0.2
- OpenAI version: 1.61.1
- PEFT version: 0.14.0

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete
@github-actions github-actions bot added 🏋 GRPO Related to GRPO 🐛 bug Something isn't working labels Feb 5, 2025
@tyler-romero
Copy link
Contributor Author

Backtracking to find a commit that works:
Passes: 1123bd0
Fails: ed14ed9

@tyler-romero
Copy link
Contributor Author

Ah, this passes so the issue is related to using multiple GPUs:

CUDA_VISIBLE_DEVICES="0" pytest tests/test_grpo_trainer.py -k test_training_peft

@tyler-romero tyler-romero changed the title GRPO tests failing on main GRPO tests failing in multi-device setting on main Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working 🏋 GRPO Related to GRPO
Projects
None yet
Development

No branches or pull requests

1 participant