Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MinCut pooling for sparse adjacency matrix #9243

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

xiaohan2012
Copy link

@xiaohan2012 xiaohan2012 commented Apr 26, 2024

Motivation

The current nn.dense.mincut_pool requires the input adjacency matrix to be a dense matrix. This requirement does not scale to large graphs

What

This PR considers a sparse counterpart of nn.dense.mincut_pool, which takes in sparse adjacency matrix as input.

Speed comparison with dense mincut

  • Benchmarked on a 8k x 8k adjacency matrix of sparsity ratio 0.01
  • Device: CPU
  • OS: OSX 14.2.1 (M2 chip)
The script
import torch
from contexttimer import Timer
from torch_geometric.nn.pool.mincut import mincut_pool as mincut_pool_sparse
from torch_geometric.nn.dense.mincut_pool import  dense_mincut_pool


batch_size, num_nodes, channels, num_clusters = (10, 8000, 100, 10)
sparsity = 0.01

# batched feature matrix
x = torch.randn((batch_size, num_nodes, channels))
adj_dense = (torch.rand((batch_size, num_nodes, num_nodes)) <= sparsity).type(
    torch.float
)
adj_sparse = adj_dense.to_sparse()
# batched node clustering tensor
s = torch.randn((batch_size, num_nodes, num_clusters))
# batched random masks
mask = torch.randint(0, 2, (batch_size, num_nodes), dtype=torch.bool)

with Timer() as t:
    mincut_pool_sparse(x, adj_sparse, s, mask)
    print(f"Execution time: {t.elapsed} seconds")

with Timer() as t:    
    dense_mincut_pool(x, adj_dense, s, mask)
    print(f"Execution time: {t.elapsed} seconds")
Spase mincut pooling: 0.5210544160217978 s
Dense mincut pooling: 1.2443664169986732 s

Remark on matrix multiplication

torch.sparse.mm is used, therefore restricting the use to COO matrices only. As a future work, multiplying CSR matrix with CSC matrix should be used for better performance

@xiaohan2012 xiaohan2012 marked this pull request as draft April 26, 2024 06:27
@github-actions github-actions bot added the nn label Apr 26, 2024
Copy link

codecov bot commented Apr 26, 2024

Codecov Report

Attention: Patch coverage is 98.79518% with 1 line in your changes missing coverage. Please review.

Project coverage is 87.36%. Comparing base (0fa52cb) to head (eb68672).

Files with missing lines Patch % Lines
torch_geometric/nn/pool/mincut.py 98.79% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9243      +/-   ##
==========================================
+ Coverage   86.59%   87.36%   +0.76%     
==========================================
  Files         482      483       +1     
  Lines       31460    31538      +78     
==========================================
+ Hits        27242    27552     +310     
+ Misses       4218     3986     -232     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@xiaohan2012 xiaohan2012 marked this pull request as ready for review November 11, 2024 03:47
@xiaohan2012 xiaohan2012 marked this pull request as draft November 11, 2024 04:28
@xiaohan2012 xiaohan2012 marked this pull request as ready for review November 11, 2024 08:25
@xiaohan2012
Copy link
Author

@wsad1 @EdisonLeeeee @rusty1s can I ask for a review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants