Graph partition based on balance_edge #309
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request add balance_edge as one argument for graph partition based on metis.py. The original metis is just based on node_type for partitioning and not considered for edge_type specially for link prediction case with ddp. Here we used the in-degree for the edge_weight to replace the node_weight when we choose balance_edge as True.
The test results based on no balance_edge argument (same as old version) and balance_edge in ClusterData of cluster.py is
No balance_edge and based on default node_type:
----- cluster_data[0] = Data(x=[1104230], node_type=[1104230], edge_type=[21259372], edge_index=[2, 21259372]) ----
----- cluster_data[1] = Data(x=[1018995], node_type=[1018995], edge_type=[21806130], edge_index=[2, 21806130]) ----
----- cluster_data[2] = Data(x=[1086336], node_type=[1086336], edge_type=[24065724], edge_index=[2, 24065724]) ----
----- cluster_data[3] = Data(x=[1939568], node_type=[1939568], edge_type=[26776812], edge_index=[2, 26776812]) ----
balance_edge=True:
----- cluster_data[0] = Data(x=[1249728], node_type=[1249728], edge_type=[10912882], edge_index=[2, 10912882]) ----
----- cluster_data[1] = Data(x=[1249728], node_type=[1249728], edge_type=[14286552], edge_index=[2, 14286552]) ----
----- cluster_data[2] = Data(x=[1325624], node_type=[1325624], edge_type=[40316592], edge_index=[2, 40316592]) ----
----- cluster_data[3] = Data(x=[1324049], node_type=[1324049], edge_type=[33047668], edge_index=[2, 33047668]) ----
From the test results above, the edge_type column number is more balanced with balance_edge=True which is good for DDP link prediction like cases. The above test is based on Taobao.py dataset.
Any comments please let me know. thanks.