Merge pull request #89 from pyt-team/topotune

Topotune Readme
geometric-intelligence · Oct 10, 2024 · 7d00bc4 · 7d00bc4
2 parents 0e5325a + d5e2ea9
commit 7d00bc4
Show file tree

Hide file tree

Showing 6 changed files with 95 additions and 51 deletions.
diff --git a/README.md b/README.md
@@ -145,16 +145,60 @@ We list the neural networks trained and evaluated by `TopoBenchmarkX`, organized
 ### Combinatorial complexes
 | Model | Reference |
 | --- | --- |
-| GCCN | Generalized Combinatorial Complex Neural Networks |
+| GCCN | [Generalized Combinatorial Complex Neural Networks](https://arxiv.org/pdf/2410.06530) |
 
 ## :bulb: TopoTune
 
-We include TopoTune, a comprehensive framework for easily defining and training new, general TDL models (GCCNs, pictured below) on any domain using any (graph) neural network ω as a backbone, as well as reproducing existing models. To train and test a GCCN, it is sufficient to specify the chocie of domain, neighborhood structure, and backbone model in the configuration. We provide scripts to reproduce a broad class of GCCNs in `scripts/topotune` and reproduce iterations of existing neural networks in `scripts/topotune/existing_models`, as previously reported.
+We include TopoTune, a comprehensive framework for easily defining and training new, general TDL models on any domain using any (graph) neural network ω as a backbone. The pre-print detailing this framework is [TopoTune: A Framework for Generalized Combinatorial Complex Neural Networks](https://arxiv.org/pdf/2410.06530). In a GCCN (pictured below), the input complex is represented as an ensemble of strictly augmented Hasse graphs, one per neighborhood of the complex. Each of these Hasse graphs is processed by a sub model ω, and the outputs are rank-wise aggregated in between layers. 
 
 <p align="center">
   <img src="resources/gccn.jpg" width="700">
 </p>
 
+### Defining and training a GCCN
+To implement and train a GCCN, run the following command line with the desired choice of dataset, lifting domain (ex: `cell`, `simplicial`), PyTorch Geometric backbone model (ex: `GCN`, `GIN`, `GAT`, `GraphSAGE`) and parameters (ex. `model.backbone.GNN.num_layers=2`), neighborhood structure (routes), and other hyperparameters.
+
+
+```
+python -m topobenchmarkx \
+    dataset=graph/PROTEINS \
+    dataset.split_params.data_seed=1 \
+    model=cell/topotune\
+    model.tune_gnn=GCN \
+    model.backbone.GNN.num_layers=2 \
+    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[2,1\],boundary\]\] \
+    model.backbone.layers=4 \
+    model.feature_encoder.out_channels=32 \
+    model.feature_encoder.proj_dropout=0.3 \
+    model.readout.readout_name=PropagateSignalDown \
+    logger.wandb.project=TopoTune_cell \
+    trainer.max_epochs=1000 \
+    callbacks.early_stopping.patience=50 \
+```
+
+To use a single augmented Hasse graph expansion, use `model={domain}/topotune_onehasse` instead of `model={domain}/topotune`.
+
+To specify a set of neighborhoods (routes) on the complex, use a list of neighborhoods each specified as `\[\[{source_rank}, {destination_rank}\], {neighborhood}\]`. Currently, the following options for `{neighborhood}` are supported:
+- `up_laplacian`, from rank $r$ to $r$
+- `down_laplacian`, from rank $r$ to $r$
+- `boundary`, from rank $r$ to $r-1$
+- `coboundary`, from rank $r$ to $r+1$
+- `adjacency`, from rank $r$ to $r$ (stand-in for `up_adjacency`, as `down_adjacency` not yet supported in TopoBenchmarkX)
+
+
+### Using backbone models from any package
+By default, backbone models are imported from `torch_geometric.nn.models`. To import and specify a backbone model from any other package, such as `torch.nn.Transformer` or `dgl.nn.GATConv`, it is sufficient to 1) make sure the package is installed and 2) specify in the command line:
+
+```
+model.tune_gnn = {backbone_model}
+model.backbone.GNN._target_={package}.{backbone_model}
+```
+
+### Reproducing experiments
+
+We provide scripts to reproduce experiments on a broad class of GCCNs in [`scripts/topotune`](scripts/topotune) and reproduce iterations of existing neural networks in [`scripts/topotune/existing_models`](scripts/topotune/existing_models), as previously reported in the [TopoTune paper](https://arxiv.org/pdf/2410.06530).
+
+We invite users interested in running extensive sweeps on new GCCNs to replicate the `--multirun` flag in the scripts. This is a shortcut for running every possible combination of the specified parameters in a single command.
 
 ## :rocket: Liftings
 

diff --git a/scripts/topotune/existing_models/tune_cwn.sh b/scripts/topotune/existing_models/tune_cwn.sh
@@ -2,7 +2,7 @@ python -m topobenchmarkx \
     model=cell/topotune_onehasse,cell/topotune \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
     model.backbone.GNN.num_layers=1 \
-    model.backbone.routes=\[\[\[0,1\],coincidence\],\[\[1,1\],adjacency\],\[\[2,1\],incidence\]\] \
+    model.backbone.routes=\[\[\[0,1\],coboundary\],\[\[1,1\],adjacency\],\[\[2,1\],boundary\]\] \
     logger.wandb.project=TopoTune_CWN \
     dataset=graph/MUTAG \
     optimizer.parameters.lr=0.001 \
@@ -24,7 +24,7 @@ python -m topobenchmarkx \
     model=cell/topotune_onehasse,cell/topotune \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
     model.backbone.GNN.num_layers=1 \
-    model.backbone.routes=\[\[\[0,1\],coincidence\],\[\[1,1\],adjacency\],\[\[2,1\],incidence\]\] \
+    model.backbone.routes=\[\[\[0,1\],coboundary\],\[\[1,1\],adjacency\],\[\[2,1\],boundary\]\] \
     logger.wandb.project=TopoTune_CWN \
     dataset=graph/NCI1 \
     optimizer.parameters.lr=0.001 \
@@ -45,7 +45,7 @@ python -m topobenchmarkx \
     model=cell/topotune_onehasse,cell/topotune \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
     model.backbone.GNN.num_layers=1 \
-    model.backbone.routes=\[\[\[0,1\],coincidence\],\[\[1,1\],adjacency\],\[\[2,1\],incidence\]\] \
+    model.backbone.routes=\[\[\[0,1\],coboundary\],\[\[1,1\],adjacency\],\[\[2,1\],boundary\]\] \
     logger.wandb.project=TopoTune_CWN \
     dataset=graph/NCI109 \
     optimizer.parameters.lr=0.001 \
@@ -65,7 +65,7 @@ python -m topobenchmarkx \
     model=cell/topotune_onehasse,cell/topotune \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
     model.backbone.GNN.num_layers=1 \
-    model.backbone.routes=\[\[\[0,1\],coincidence\],\[\[1,1\],adjacency\],\[\[2,1\],incidence\]\] \
+    model.backbone.routes=\[\[\[0,1\],coboundary\],\[\[1,1\],adjacency\],\[\[2,1\],boundary\]\] \
     logger.wandb.project=TopoTune_CWN \
     dataset=graph/ZINC \
     optimizer.parameters.lr=0.001 \
@@ -90,7 +90,7 @@ python -m topobenchmarkx \
     model=cell/topotune_onehasse,cell/topotune \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
     model.backbone.GNN.num_layers=1 \
-    model.backbone.routes=\[\[\[0,1\],coincidence\],\[\[1,1\],adjacency\],\[\[2,1\],incidence\]\] \
+    model.backbone.routes=\[\[\[0,1\],coboundary\],\[\[1,1\],adjacency\],\[\[2,1\],boundary\]\] \
     logger.wandb.project=TopoTune_CWN \
     dataset=graph/cocitation_citeseer \
     optimizer.parameters.lr=0.001 \
@@ -111,7 +111,7 @@ python -m topobenchmarkx \
     model=cell/topotune_onehasse,cell/topotune \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
     model.backbone.GNN.num_layers=1 \
-    model.backbone.routes=\[\[\[0,1\],coincidence\],\[\[1,1\],adjacency\],\[\[2,1\],incidence\]\] \
+    model.backbone.routes=\[\[\[0,1\],coboundary\],\[\[1,1\],adjacency\],\[\[2,1\],boundary\]\] \
     logger.wandb.project=TopoTune_CWN \
     dataset=graph/cocitation_pubmed \
     optimizer.parameters.lr=0.01 \
@@ -134,7 +134,7 @@ python -m topobenchmarkx \
     model=cell/topotune_onehasse,cell/topotune \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
     model.backbone.GNN.num_layers=1 \
-    model.backbone.routes=\[\[\[0,1\],coincidence\],\[\[1,1\],adjacency\],\[\[2,1\],incidence\]\] \
+    model.backbone.routes=\[\[\[0,1\],coboundary\],\[\[1,1\],adjacency\],\[\[2,1\],boundary\]\] \
     logger.wandb.project=TopoTune_CWN \
     dataset=graph/PROTEINS,graph/cocitation_cora \
     optimizer.parameters.lr=0.001 \

diff --git a/scripts/topotune/existing_models/tune_sccn.sh b/scripts/topotune/existing_models/tune_sccn.sh
@@ -5,7 +5,7 @@ python -m topobenchmarkx \
     model.feature_encoder.out_channels=128 \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
     model.backbone.GNN.num_layers=1 \
-    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coincidence\],\[\[1,0\],incidence\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coincidence\],\[\[2,1\],incidence\],\[\[2,2\],down_laplacian\]\] \
+    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coboundary\],\[\[1,0\],boundary\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coboundary\],\[\[2,1\],boundary\],\[\[2,2\],down_laplacian\]\] \
     model.backbone.layers=3 \
     dataset.split_params.data_seed=1,3,5,7,9 \
     model.readout.readout_name=NoReadOut \
@@ -28,7 +28,7 @@ python -m topobenchmarkx \
     model.feature_encoder.out_channels=64 \
     model.backbone.GNN.num_layers=1 \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
-    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coincidence\],\[\[1,0\],incidence\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coincidence\],\[\[2,1\],incidence\],\[\[2,2\],down_laplacian\]\] \
+    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coboundary\],\[\[1,0\],boundary\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coboundary\],\[\[2,1\],boundary\],\[\[2,2\],down_laplacian\]\] \
     model.backbone.layers=3 \
     model.feature_encoder.proj_dropout=0.5 \
     model.readout.readout_name=PropagateSignalDown \
@@ -51,7 +51,7 @@ python -m topobenchmarkx \
     model.feature_encoder.out_channels=64 \
     model.backbone.GNN.num_layers=1 \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
-    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coincidence\],\[\[1,0\],incidence\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coincidence\],\[\[2,1\],incidence\],\[\[2,2\],down_laplacian\]\] \
+    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coboundary\],\[\[1,0\],boundary\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coboundary\],\[\[2,1\],boundary\],\[\[2,2\],down_laplacian\]\] \
     model.backbone.layers=4 \
     model.readout.readout_name=NoReadOut \
     transforms.graph2simplicial_lifting.signed=True \
@@ -72,7 +72,7 @@ python -m topobenchmarkx \
 python -m topobenchmarkx \
     model=simplicial/topotune_onehasse,simplicial/topotune \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
-    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coincidence\],\[\[1,0\],incidence\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coincidence\],\[\[2,1\],incidence\],\[\[2,2\],down_laplacian\]\] \
+    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coboundary\],\[\[1,0\],boundary\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coboundary\],\[\[2,1\],boundary\],\[\[2,2\],down_laplacian\]\] \
     dataset=graph/PROTEINS \
     optimizer.parameters.lr=0.01 \
     model.feature_encoder.out_channels=128 \
@@ -95,7 +95,7 @@ python -m topobenchmarkx \
     model=simplicial/topotune_onehasse,simplicial/topotune \
     dataset=graph/ZINC \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
-    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coincidence\],\[\[1,0\],incidence\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coincidence\],\[\[2,1\],incidence\],\[\[2,2\],down_laplacian\]\] \
+    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coboundary\],\[\[1,0\],boundary\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coboundary\],\[\[2,1\],boundary\],\[\[2,2\],down_laplacian\]\] \
     optimizer.parameters.lr=0.001 \
     model.feature_encoder.out_channels=128 \
     model.backbone.layers=4 \
@@ -117,7 +117,7 @@ python -m topobenchmarkx \
 python -m topobenchmarkx \
     model=simplicial/topotune_onehasse,simplicial/topotune \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
-    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coincidence\],\[\[1,0\],incidence\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coincidence\],\[\[2,1\],incidence\],\[\[2,2\],down_laplacian\]\] \
+    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coboundary\],\[\[1,0\],boundary\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coboundary\],\[\[2,1\],boundary\],\[\[2,2\],down_laplacian\]\] \
     dataset=graph/cocitation_citeseer \
     optimizer.parameters.lr=0.01 \
     model.feature_encoder.out_channels=64 \
@@ -139,7 +139,7 @@ python -m topobenchmarkx \
     model=simplicial/topotune_onehasse,simplicial/topotune \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
     model.backbone.GNN._target_=topobenchmarkx.nn.backbones.graph.IdentityGCN \
-    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coincidence\],\[\[1,0\],incidence\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coincidence\],\[\[2,1\],incidence\],\[\[2,2\],down_laplacian\]\] \
+    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coboundary\],\[\[1,0\],boundary\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coboundary\],\[\[2,1\],boundary\],\[\[2,2\],down_laplacian\]\] \
     dataset=graph/cocitation_cora \
     optimizer.parameters.lr=0.01 \
     model.feature_encoder.out_channels=32 \
@@ -160,7 +160,7 @@ python -m topobenchmarkx \
 python -m topobenchmarkx \
     model=simplicial/topotune_onehasse,simplicial/topotune \
     model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
-    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coincidence\],\[\[1,0\],incidence\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coincidence\],\[\[2,1\],incidence\],\[\[2,2\],down_laplacian\]\] \
+    model.backbone.routes=\[\[\[0,0\],up_laplacian\],\[\[0,1\],coboundary\],\[\[1,0\],boundary\],\[\[1,1\],down_laplacian\],\[\[1,1\],up_laplacian\],\[\[1,2\],coboundary\],\[\[2,1\],boundary\],\[\[2,2\],down_laplacian\]\] \
     dataset=graph/cocitation_pubmed \
     optimizer.parameters.lr=0.01 \
     model.feature_encoder.out_channels=64 \