-
Notifications
You must be signed in to change notification settings - Fork 32
/
Copy pathazure-vm-multi.md
81 lines (55 loc) · 2.35 KB
/
azure-vm-multi.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# Azure VM Cluster (via Dask)
## Create a Cluster using Dask Cloud Provider
The easiest way to setup a multi-node, multi-GPU cluster on Azure is to use [Dask Cloud Provider](https://cloudprovider.dask.org/en/latest/azure.html).
### 1. Install Dask Cloud Provider
Dask Cloud Provider can be installed via `conda` or `pip`. The Azure-specific capabilities will need to be installed via the `[azure]` pip extra.
```shell
$ pip install dask-cloudprovider[azure]
```
### 2. Configure your Azure Resources
Set up your [Azure Resouce Group](https://cloudprovider.dask.org/en/latest/azure.html#resource-groups), [Virtual Network](https://cloudprovider.dask.org/en/latest/azure.html#virtual-networks), and [Security Group](https://cloudprovider.dask.org/en/latest/azure.html#security-groups) according to [Dask Cloud Provider instructions](https://cloudprovider.dask.org/en/latest/azure.html#authentication).
### 3. Create a Cluster
In Python terminal, a cluster can be created using the `dask_cloudprovider` package. The below example creates a cluster with 2 workers in `westus2` with `Standard_NC12s_v3` VMs. The VMs should have at least 100GB of disk space in order to accommodate the RAPIDS container image and related dependencies.
```python
from dask_cloudprovider.azure import AzureVMCluster
resource_group = "<RESOURCE_GROUP>"
vnet = "<VNET>"
security_group = "<SECURITY_GROUP>"
subscription_id = "<SUBSCRIPTION_ID>"
cluster = AzureVMCluster(
resource_group=resource_group,
vnet=vnet,
security_group=security_group,
subscription_id=subscription_id,
location="westus2",
vm_size="Standard_NC12s_v3",
public_ingress=True,
disk_size=100,
n_workers=2,
worker_class="dask_cuda.CUDAWorker",
docker_image="{{rapids_container}}",
docker_args="-p 8787:8787 -p 8786:8786",
)
```
### 4. Test RAPIDS
To test RAPIDS, create a distributed client for the cluster and query for the GPU model.
```python
from dask.distributed import Client
client = Client(cluster)
def get_gpu_model():
import pynvml
pynvml.nvmlInit()
return pynvml.nvmlDeviceGetName(pynvml.nvmlDeviceGetHandleByIndex(0))
client.submit(get_gpu_model).result()
```
```shell
Out[5]: b'Tesla V100-PCIE-16GB'
```
### 5. Cleanup
Once done with the cluster, ensure the `cluster` and `client` are closed:
```python
client.close()
cluster.close()
```
```{relatedexamples}
```