-
Notifications
You must be signed in to change notification settings - Fork 109
Distributed computing (eg, multi-GPU) support #13
Comments
@erensezener Did you figure this out? :) |
@gizacard Would you mind providing some instruction on this? which options should be set? Thanks |
@gizacard I wanted to train using Multi-GPU (4 gpus) , and for that I used the
Although I am not aiming for slurs job the code here require me to set After setting this parameters, when I run the code, the training never starts. Though without distributed_training (single gpu) it works fine. Can you guide me if I am doing correct? Thanks |
Something like this worked for me |
@fabrahman Could you provide an update on this issue? I have exactly the same issue, and I found that the code freezes without any error message after executing line 194 of train_reader.py
|
@Duemoo I also encountered this problem, using multiple gpu, I found that the code freezes without any error message after executing line 194 of train_reader.py, |
@szh-max Hi I also encountered this problem and I solved it by updating torch version to torch==1.10.0 and |
I see that there is some code supporting multi-GPUs, eg here and here.
However, I don't see an option/flag to actually utilize distributed computing. Could you clarify?
Thank you.
The text was updated successfully, but these errors were encountered: