Pytorch Lightning on Multi Pods on Kubernetes #7837

asahalyft · 2021-06-04T19:56:52Z

asahalyft
Jun 4, 2021

I have been going through the Pytorch examples and I swa that we can specify the number of nodes in the Trainer to enable multi node training. However, I do not see any discussion on how to specify the hosts of the multi nodes.

Moreover, our ML Platform is entirely on Kubernetes. We do not have fixed host ids. How can we launch a Distributed Pytorch lightning job over a set of Kubernetes pods.

Is there a guideline for that?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pytorch Lightning on Multi Pods on Kubernetes #7837

{{title}}

Replies: 0 comments

Select a reply

Pytorch Lightning on Multi Pods on Kubernetes #7837

asahalyft Jun 4, 2021

Replies: 0 comments

asahalyft
Jun 4, 2021