-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core
] fix DP issue
#222
[core
] fix DP issue
#222
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two comments, generally looks good.
A CI test ( |
What does this PR do?
The mini batching PR introduced a small bug that led to DP crash using
accelerate
as we tried to all reduce tensors that did not had the same shape.The tensors that did not had the same shape were the logprobs tensors, that derive from the inputs.
The fix is to add a check, that padds correctly the model inputs, and corresponding attention masks, and fills them with the correct value using
accelerator.pad_across_processes
cc @lvwerra
run for gpt2 sentiment with DP=2: https://wandb.ai/distill-bloom/trl/runs/p7pzc7yk?workspace=user-younesbelkada
run for t5 sentiment with DP=2: https://wandb.ai/distill-bloom/trl/runs/nb9as16x?workspace=user-younesbelkada