What's the Correct Batch Size in Distributed Data Parallel Training? #2684

Answered by rohitgr7

zhenhuahu asked this question in DDP / multi-GPU / multi-node

zhenhuahu
Jul 24, 2020

❓ Questions and Help

Before asking:

search the issues.
search the docs.

I'm trying to use pl and 'ddp' to do single node multi-GPU training. My code is as below. I used batch_size 2 in Dataloader. After the code, does every gpu (in total 8 gpus) each get batch_size of 2, or only 2 gpus can access the data (each has bacth_size 1)?

train_loader = DataLoader(dataset=train_dataset, batch_size=2, shuffle=True, num_workers=4, pin_memory=True, drop_last=True)

trainer = pl.Trainer(gpus=8, num_nodes = 4 distributed_backend='ddp')
trainer.fit(model, train_dataloader=train_loader)

What have you tried?

What's your environment?

OS: [e.g. iOS, Linux, Win]
Packaging [e.g. pip, conda]
Version [e.g. 0.5.2.1]

Answered by rohitgr7

Is this what you need? https://pytorch-lightning.readthedocs.io/en/latest/multi_gpu.html#batch-size

View full answer

Replies: 2 comments

rohitgr7
Jul 24, 2020

Is this what you need? https://pytorch-lightning.readthedocs.io/en/latest/multi_gpu.html#batch-size

0 replies

Answer selected by Borda

zhenhuahu
Jul 24, 2020
Author

Is this what you need? https://pytorch-lightning.readthedocs.io/en/latest/multi_gpu.html#batch-size

Yes. That's exactly what I want. Thank you so much!

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment