What's the Correct Batch Size in Distributed Data Parallel Training? #2684
Answered
by
rohitgr7
zhenhuahu
asked this question in
DDP / multi-GPU / multi-node
-
❓ Questions and HelpBefore asking:
I'm trying to use pl and 'ddp' to do single node multi-GPU training. My code is as below. I used batch_size 2 in Dataloader. After the code, does every gpu (in total 8 gpus) each get batch_size of 2, or only 2 gpus can access the data (each has bacth_size 1)?train_loader = DataLoader(dataset=train_dataset, batch_size=2, shuffle=True, num_workers=4, pin_memory=True, drop_last=True)trainer = pl.Trainer(gpus=8, num_nodes = 4 distributed_backend='ddp') What have you tried?What's your environment?
|
Beta Was this translation helpful? Give feedback.
Answered by
rohitgr7
Jul 24, 2020
Replies: 2 comments
-
Is this what you need? https://pytorch-lightning.readthedocs.io/en/latest/multi_gpu.html#batch-size |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
Borda
-
Yes. That's exactly what I want. Thank you so much! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is this what you need? https://pytorch-lightning.readthedocs.io/en/latest/multi_gpu.html#batch-size