You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
trl on main
~> pytest tests/test_grpo_trainer.py
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.12.5, pytest-8.3.4, pluggy-1.5.0
rootdir: /home/tromero/workspace/trl
configfile: pyproject.toml
plugins: anyio-4.8.0, xdist-3.6.1, rerunfailures-15.0, cov-6.0.0
collected 14 items
tests/test_grpo_trainer.py .FFFFFFFFFF...
A representative output:
______________________________________________________________________________ GRPOTrainerTester.test_training_reward_func_standard _______________________________________________________________________________
self = <tests.test_grpo_trainer.GRPOTrainerTester testMethod=test_training_reward_func_standard>
def test_training_reward_func_standard(self):
# Test if trainer can handle reward function with standard format
dataset = load_dataset("trl-internal-testing/zen", "standard_prompt_only", split="train")
def reward_func(completions, **kwargs):
"""Reward function that rewards longer completions."""
return [float(len(completion)) for completion in completions]
with tempfile.TemporaryDirectory() as tmp_dir:
training_args = GRPOConfig(
output_dir=tmp_dir,
learning_rate=0.1, # increase the learning rate to speed up the test
per_device_train_batch_size=2, # reduce the batch size to reduce memory usage
num_generations=3, # reduce the number of generations to reduce memory usage
max_completion_length=32, # reduce the completion length to reduce memory usage
report_to="none",
)
trainer = GRPOTrainer(
model="trl-internal-testing/tiny-Qwen2ForCausalLM-2.5",
reward_funcs=reward_func,
args=training_args,
train_dataset=dataset,
)
previous_trainable_params = {n: param.clone() for n, param in trainer.model.named_parameters()}
trainer.train()
self.assertIsNotNone(trainer.state.log_history[-1]["train_loss"])
# Check the params have changed
for n, param in previous_trainable_params.items():
new_param = trainer.model.get_parameter(n)
> self.assertFalse(torch.equal(param, new_param), f"Parameter {n} has not changed.")
E AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
============================================================================================= short test summary info =============================================================================================
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_0_standard_prompt_only - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_1_conversational_prompt_only - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_different_reward_model - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_multiple_mixed_reward_funcs - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_multiple_reward_funcs - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_peft - AssertionError: True is not false : Parameter base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_reward_func_additional_column - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_reward_func_conversational - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_reward_func_standard - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_torch_compile - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_vllm - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
FAILED tests/test_grpo_trainer.py::GRPOTrainerTester::test_training_with_sync_ref_model - AssertionError: True is not false : Parameter model.embed_tokens.weight has not changed.
============================================================================== 12 failed, 2 passed, 82 warnings in 163.33s (0:02:43) ==============================================================================
System Info
~> trl env
[2025-02-05 09:08:41,976] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
INFO 02-05 09:08:43 __init__.py:183] Automatically detected platform cuda.
Copy-paste the following information when reporting an issue:
- Platform: Linux-6.8.0-52-generic-x86_64-with-glibc2.35
- Python version: 3.12.5
- PyTorch version: 2.5.1
- CUDA device(s): NVIDIA GeForce RTX 4090, NVIDIA GeForce RTX 4090
- Transformers version: 4.48.2
- Accelerate version: 1.3.0
- Accelerate config: not found
- Datasets version: 3.2.0
- HF Hub version: 0.28.1
- TRL version: 0.15.0.dev0
- bitsandbytes version: 0.45.1
- DeepSpeed version: 0.16.3
- Diffusers version: 0.32.2
- Liger-Kernel version: 0.5.2
- LLM-Blender version: 0.0.2
- OpenAI version: 1.61.1
- PEFT version: 0.14.0
Checklist
I have checked that my issue isn't already filed (see open issues)
I have included my system information
Any code provided is minimal, complete, and reproducible (more on MREs)
Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
Any traceback provided is complete
The text was updated successfully, but these errors were encountered:
Reproduction
A representative output:
System Info
Checklist
The text was updated successfully, but these errors were encountered: