Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluation_loop() Bug in KTOTrainer #1834

Closed
MAOJIASONG opened this issue Jul 15, 2024 · 1 comment
Closed

evaluation_loop() Bug in KTOTrainer #1834

MAOJIASONG opened this issue Jul 15, 2024 · 1 comment

Comments

@MAOJIASONG
Copy link
Contributor

MAOJIASONG commented Jul 15, 2024

Hi, there.

  1. I found that we have this code line: target_indicies = [i for i in range(len(random_batch["kl"])) if random_batch["kl"][i] is False] in the evaluation_loop() and this doesn't function correctly, as the eval dataset don't have this key "kl" normally.

See Error below:

[rank0]: target_indicies = [i for i in range(len(random_batch["kl"])) if random_batch["kl"][i] is False]
[rank0]: KeyError: 'kl'

So, I guess the original idea for self.generate_during_eval should try to test the samples with "label == False"?
If I am correct, I can submit a PR to fix it, thanks.

  1. Moreover, the latter code also has a subtle problem:
    target_batch = { "prompt_input_ids": itemgetter(*target_indicies)(random_batch["prompt_input_ids"]), "prompt_attention_mask": itemgetter(*target_indicies)(random_batch["prompt_attention_mask"]), "prompt": itemgetter(*target_indicies)(random_batch["prompt"]), }

The selected item from itemgetter would be a tuple, which cannot be directly used in the next generate() function:
policy_output = model.generate( input_ids=batch["prompt_input_ids"], attention_mask=batch["prompt_attention_mask"], max_length=self.max_length, do_sample=True, pad_token_id=self.tokenizer.pad_token_id, )

This could lead to an error: tuple doesn't have shape[0], because we want a batched tensor to input, instead of a tuple of tensors.

So I suggest the following fix be updated:
target_batch = { "prompt_input_ids": random_batch["prompt_input_ids"][target_indicies], "prompt_attention_mask": random_batch["prompt_attention_mask"][target_indicies], "prompt": itemgetter(*target_indicies)(random_batch["prompt"]), }

  1. The last one is about the prediction_step():
       with torch.no_grad(), prediction_context_manager():
           loss, metrics = self.get_batch_loss_metrics(model, inputs)

       # force log the metrics
       if self.accelerator.is_main_process:
           self.store_metrics(metrics, train_eval="eval")

       if prediction_loss_only:
           return (loss.detach(), None, None)

       # logits for the chosen and rejected samples from model
       logits_dict = {
           "eval_logits/chosen": metrics["logits/chosen"],
           "eval_logits/rejected": metrics["logits/rejected"],
       }
       logits = tuple(v.unsqueeze(dim=0) for k, v in logits_dict.items() if k not in ignore_keys)
       logits = torch.stack(logits).mean(axis=1).to(self.accelerator.device)
       labels = torch.zeros(logits.shape[0], device=self.accelerator.device)

       return (loss.detach(), logits, labels)

Because the implemented get_batch_loss_metrics() doesn't return any logits, so if self.compute_metrics in the trainer is utilised, then it will give key error: logits/chosen and logits/rejected.

@kashif
Copy link
Collaborator

kashif commented Jul 16, 2024

thanks @MAOJIASONG yes a PR fix would be most welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants