-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while training on multi gpus #3273
Comments
Using drop_last = True is not acceptable |
Hi, I think I have solved that recently #3020. Which version are you on? Please try to upgrade and let me know. |
This issue persists in Pytorch Lightning v0.9.0. |
@RahulSajnani are you using results object or the same kind of manual reduction as shown in @nrjvarshney's code? |
@awaelchli I am using the same kind of manual reduction as @nrjvarshney . The reduction is as shown here:
|
yes, then it makes sense that it fails, because for stacking, all tensors need to have the same shape. If the last tensor has different batch size, it fails. def validation_step(self, batch, batch_idx):
logits, softmax_logits = self(**batch)
loss, prediction_label_count = self.loss_function(logits, batch["labels"])
accuracy = self.compute_accuracy(logits, batch["labels"])
result = EvalResult()
result.log('val_accuracy', accuracy, reduce_fx=torch.mean) # mean is also the default, we don't need to write it.
result.log('val_loss', loss)
def validation_epoch_end(self, outputs_of_validation_steps):
# not needed! everything will be done by results: collects all acc./losses, reduces, then logs. Hope this helps. |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
I get the following error on training with multiple gpus. It works for single gpu training
The text was updated successfully, but these errors were encountered: