Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GraphLearn-for-PyTorch(GLT) distributed examples #7402

Merged
merged 70 commits into from
Aug 4, 2023
Merged
Changes from 1 commit
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
f6dc0ed
Add GLT dist example
HuxleyHu98 May 19, 2023
714caf8
Merge branch 'pyg-team:master' into glt2pyg
husimplicity May 19, 2023
2422afc
refine docs
HuxleyHu98 May 19, 2023
8ff83f3
Merge branch 'glt2pyg' of https://github.com/husimplicity/pytorch_geo…
HuxleyHu98 May 19, 2023
0212ce0
minor
HuxleyHu98 May 21, 2023
6a531fe
minor
HuxleyHu98 May 22, 2023
75534b5
Merge branch 'pyg-team:master' into glt2pyg
husimplicity May 22, 2023
f8c6723
Merge branch 'glt2pyg' of https://github.com/husimplicity/pytorch_geo…
HuxleyHu98 May 22, 2023
05a2a95
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 22, 2023
4bf8f7b
minor
HuxleyHu98 May 25, 2023
6e1072f
minor
HuxleyHu98 May 25, 2023
af6413d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2023
a8e7c15
add papers100m cmd example
HuxleyHu98 May 25, 2023
7d09685
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2023
a757858
minor
HuxleyHu98 May 25, 2023
3367a76
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2023
54a871b
Merge branch 'pyg-team:master' into glt2pyg
husimplicity May 30, 2023
5cafdd6
Merge branch 'master' into glt2pyg
husimplicity Jun 1, 2023
9ab42bb
Merge branch 'master' into glt2pyg
husimplicity Jun 2, 2023
8d30a69
Merge branch 'master' into glt2pyg
husimplicity Jun 6, 2023
2753ed3
Merge branch 'master' into glt2pyg
husimplicity Jun 7, 2023
ea9aa82
Merge branch 'pyg-team:master' into glt2pyg
husimplicity Jun 13, 2023
5aa23ed
Merge branch 'master' into glt2pyg
husimplicity Jun 14, 2023
16724be
Merge branch 'master' into glt2pyg
husimplicity Jun 16, 2023
5396bec
Merge branch 'master' into glt2pyg
husimplicity Jun 19, 2023
2989b42
Merge branch 'master' into glt2pyg
husimplicity Jun 20, 2023
b4c61c4
Merge branch 'master' into glt2pyg
husimplicity Jun 26, 2023
8f9983e
Merge branch 'pyg-team:master' into glt2pyg
husimplicity Jun 27, 2023
7af3ede
Merge branch 'master' into glt2pyg
husimplicity Jun 28, 2023
aef296e
adjust directory structure to dist/glt
HuxleyHu98 Jun 28, 2023
1afe595
minor
HuxleyHu98 Jun 28, 2023
72821a5
Update README.md
husimplicity Jun 29, 2023
0bd075a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 29, 2023
0aef26c
Merge branch 'master' into glt2pyg
husimplicity Jun 29, 2023
8b93d25
style
HuxleyHu98 Jun 30, 2023
74f7091
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 30, 2023
341e6bf
style
HuxleyHu98 Jun 30, 2023
39eaabd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 30, 2023
08ccf94
add changelog
HuxleyHu98 Jun 30, 2023
3522383
Merge branch 'glt2pyg' of https://github.com/husimplicity/pytorch_geo…
HuxleyHu98 Jun 30, 2023
9e2bfd4
Merge branch 'master' into glt2pyg
husimplicity Jul 3, 2023
d3181dc
Merge branch 'master' into glt2pyg
husimplicity Jul 4, 2023
9082523
Merge branch 'master' into glt2pyg
husimplicity Jul 5, 2023
2e86235
Merge branch 'master' of https://github.com/husimplicity/pytorch_geom…
HuxleyHu98 Jul 6, 2023
f8c1f5c
Merge branch 'master' into glt2pyg
husimplicity Jul 10, 2023
632789d
Merge branch 'glt2pyg' of https://github.com/husimplicity/pytorch_geo…
HuxleyHu98 Jul 10, 2023
817e7ef
Update documentations
HuxleyHu98 Jul 10, 2023
4e1ae67
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 10, 2023
6217ad4
Merge branch 'master' into glt2pyg
husimplicity Jul 13, 2023
f1f1635
minor
HuxleyHu98 Jul 14, 2023
4c66267
Merge branch 'master' into glt2pyg
husimplicity Jul 17, 2023
2d3d22e
update
HuxleyHu98 Jul 17, 2023
fcf7c46
Merge branch 'master' into glt2pyg
husimplicity Jul 18, 2023
c4cdc26
Merge branch 'master' into glt2pyg
husimplicity Jul 19, 2023
93ff1d4
Merge branch 'master' into glt2pyg
husimplicity Jul 19, 2023
df2764f
Merge branch 'master' into glt2pyg
husimplicity Jul 20, 2023
3ecdc02
Merge branch 'master' into glt2pyg
husimplicity Jul 20, 2023
d19ad8f
Merge branch 'master' into glt2pyg
husimplicity Jul 24, 2023
1d47ef2
Merge branch 'master' into glt2pyg
husimplicity Aug 1, 2023
0edc888
Merge branch 'master' into glt2pyg
husimplicity Aug 1, 2023
958a736
Merge branch 'master' into glt2pyg
husimplicity Aug 1, 2023
161e51e
Merge branch 'master' into glt2pyg
husimplicity Aug 3, 2023
48ee10f
Merge branch 'master' into glt2pyg
husimplicity Aug 3, 2023
45d1797
Merge branch 'master' into glt2pyg
husimplicity Aug 3, 2023
a6a7d06
Merge branch 'master' into glt2pyg
husimplicity Aug 3, 2023
8113ece
Merge branch 'master' into glt2pyg
husimplicity Aug 4, 2023
4cc37f3
update
rusty1s Aug 4, 2023
4d330cd
update
rusty1s Aug 4, 2023
55df5e8
update
rusty1s Aug 4, 2023
eca1a87
Merge branch 'master' into glt2pyg
rusty1s Aug 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update documentations
  • Loading branch information
HuxleyHu98 committed Jul 10, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
commit 817e7ef187e815b0a1538ed235dbd2f441883e73
15 changes: 15 additions & 0 deletions examples/distributed/graphlearn_for_pytorch/README.md
Original file line number Diff line number Diff line change
@@ -84,6 +84,21 @@ SSHFS: https://www.digitalocean.com/community/tutorials/how-to-use-sshfs-to-moun
Ceph: https://docs.ceph.com/en/latest/install/

### Step 1: Prepare and partition data
In distributed training (under the worker mode), each node in the cluster holds a partition of the graph.
Thus before the training starts, we partition the OGBN-Products
dataset into multiple partitions, each of which corresponds to a specific training worker.

The partitioning occurs in three steps:
1. Run a partition algorithm to assign nodes to partitions.
2. Construct partition graph structure based on the node assignment.
3. Split the node features and edge features based on the partition result.

GLT supports caching graph topology and frequently accessed features in GPU to accelerate GPU sampling and feature collection.
For feature cache, we adopt a pre-sampling-based approach to determine the hotness of vertices, and cache features for vertices with higher hotness while loading the graph.
The uncached feature data are stored in pinned memory for efficient access via UVA.

For further information about partitioning, please refer to the [tutorial](https://github.com/alibaba/graphlearn-for-pytorch/blob/main/docs/tutorial/dist.md).

Here we use `ogbn-products` and partition it into 2 partitions.
```
python partition_ogbn_dataset.py --dataset=ogbn-products --root_dir=../../../data/ogbn-products --num_partitions=2
Original file line number Diff line number Diff line change
@@ -1,19 +1,39 @@
# IP addresses for all nodes.
# Note: The first 3 params are expected to form usernames@nodes:ports
# to access each node by ssh
nodes:
- 0.0.0.0
- 1.1.1.1
# ssh ports for each node.
ports: [22, 22]
python_bins:
- /path/to/python
- /path/to/python # path to python with GLT's env
# username for remote IPs.
usernames:
- your_username_for_node_0
- your_username_for_node_1

# path to python with GLT envs for each node
python_bins:
- /path/to/python
- /path/to/python

# dataset name, e.g. ogbn-products, ogbn-papers100M.
# Note: make sure the name of dataset_root_dir the same as the dataset name.
dataset: ogbn-products

# in_channel and out_channel for the dataset
# The following two params are specific to datasets
# E.g.:
# For ogbn-products: in_channel=100, out_channel=47
# For ogbn-papers100M: in_channel=128, out_channel=172
in_channel: 100
out_channel: 47

# path to the pytorch_geometric directory
dst_paths:
- /path/to/pytorch_geometric
- /path/to/pytorch_geometric

# setup visible cuda devices for each node.
visible_devices:
- 0,1,2,3
- 4,5,6,7
- 0,1,2,3
mananshah99 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -60,6 +60,9 @@ def run_training_proc(
init_method='tcp://{}:{}'.format(master_addr, training_pg_master_port))

# Create distributed neighbor loader for training
# We replace PyG NeighborLoader with GLT DistNeighborLoader. GLT parameters
# for sampling is quite similar to PyG. We only need to configure networks
# and devices parameters within `worker_options`.
train_idx = train_idx.split(
train_idx.size(0) // num_training_procs_per_node)[local_proc_rank]
train_loader = glt.distributed.DistNeighborLoader(
@@ -88,6 +91,7 @@ def run_training_proc(
pin_memory=True))

# Define model and optimizer.
# We directly plug in standard PyG models here.
torch.cuda.set_device(current_device)
model = GraphSAGE(
in_channels=in_channels,
@@ -99,6 +103,7 @@ def run_training_proc(
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Train and test.
# We don't need to modify the PyG code for the training & test process.
f = open('dist_sage_sup.txt', 'a+')
for epoch in range(0, epochs):
model.train()
Original file line number Diff line number Diff line change
@@ -10,6 +10,18 @@
def partition_dataset(ogbn_dataset: str, root_dir: str, num_partitions: int,
num_nbrs: glt.NumNeighbors, chunk_size: int,
cache_ratio: float):
###########################################################################
# In distributed training (under the worker mode), each node in the cluster
# holds a partition of the graph. Thus before the training starts, we
# partition the dataset into multiple partitions, each of which corresponds
# to a specific training worker.
# The partitioning occurs in three steps:
# 1. Run a partition algorithm to assign nodes to partitions.
# 2. Construct partition graph structure based on the node assignment.
# 3. Split the node features and edge features based on the partition
# result.
###########################################################################

print(f'-- Loading {ogbn_dataset} ...')
dataset = PygNodePropPredDataset(ogbn_dataset, root_dir)
data = dataset[0]
@@ -45,7 +57,8 @@ def partition_dataset(ogbn_dataset: str, root_dir: str, num_partitions: int,
osp.join(test_idx_partitions_dir, f'partition{pidx}.pt'))

print('-- Initializing graph ...')
csr_topo = glt.data.CSRTopo(edge_index=data.edge_index, layout='COO')
csr_topo = glt.data.Topology(edge_index=data.edge_index,
input_layout='COO')
graph = glt.data.Graph(csr_topo, mode='ZERO_COPY')

print('-- Sampling hotness ...')