Assess how your model compares against state-of-the-art topological neural networks.
Overview β’ Get Started β’ Tutorials β’ Neural Networks β’ Liftings β’ Datasets β’ References
TopoBenchmark
(TB) is a modular Python library designed to standardize benchmarking and accelerate research in Topological Deep Learning (TDL). In particular, TB allows to train and compare the performances of all sorts of Topological Neural Networks (TNNs) across the different topological domains, where by topological domain we refer to a graph, a simplicial complex, a cellular complex, or a hypergraph. For detailed information, please refer to the TopoBenchmark: A Framework for Benchmarking Topological Deep Learning
paper.
The main pipeline trains and evaluates a wide range of state-of-the-art TNNs and Graph Neural Networks (GNNs) (see βοΈ Neural Networks) on numerous and varied datasets and benchmark tasks (see π Datasets ). Through TopoTune (see π‘ TopoTune), the library provides easy access to training and testing an entire landscape of graph-based TNNs, new or existing, on any topological domain.
Additionally, the library offers the ability to transform, i.e. lift, each dataset from one topological domain to another (see π Liftings), enabling for the first time an exhaustive inter-domain comparison of TNNs.
If you do not have conda on your machine, please follow their guide to install it.
First, clone the TopoBenchmark
repository and set up a conda environment tb
with python 3.11.3.
git clone git@github.com:geometric-intelligence/topobenchmark.git
cd TopoBenchmark
conda create -n tb python=3.11.3
Next, check the CUDA version of your machine:
/usr/local/cuda/bin/nvcc --version
and ensure that it matches the CUDA version specified in the env_setup.sh
file (CUDA=cu121
by default). If it does not match, update env_setup.sh
accordingly by changing both the CUDA
and TORCH
environment variables to compatible values as specified on this website.
Next, set up the environment with the following command.
source env_setup.sh
This command installs the TopoBenchmark
library and its dependencies.
Next, train the neural networks by running the following command:
python -m topobenchmark
Thanks to hydra
implementation, one can easily override the default experiment configuration through the command line. For instance, the model and dataset can be selected as:
python -m topobenchmark model=cell/cwn dataset=graph/MUTAG
Remark: By default, our pipeline identifies the source and destination topological domains, and applies a default lifting between them if required.
The same CLI override mechanism also applies when modifying more finer configurations within a CONFIG GROUP
. Please, refer to the official hydra
documentation for further details.
To reproduce Table 1 from the TopoBenchmark: A Framework for Benchmarking Topological Deep Learning
paper, please run the following command:
bash scripts/reproduce.sh
Remark: We have additionally provided a public W&B (Weights & Biases) project with logs for the corresponding runs (updated on June 11, 2024).
Explore our tutorials for further details on how to add new datasets, transforms/liftings, and benchmark tasks.
We list the neural networks trained and evaluated by TopoBenchmark
, organized by the topological domain over which they operate: graph, simplicial complex, cellular complex or hypergraph. Many of these neural networks were originally implemented in TopoModelX
.
Model | Reference |
---|---|
CAN | Cell Attention Network |
CCCN | Inspired by A learning algorithm for computational connected cellular network, implementation adapted from Generalized Simplicial Attention Neural Networks |
CXN | Cell Complex Neural Networks |
CWN | Weisfeiler and Lehman Go Cellular: CW Networks |
Model | Reference |
---|---|
GCCN | TopoTune: A Framework for Generalized Combinatorial Complex Neural Networks |
We include TopoTune, a comprehensive framework for easily defining and training new, general TDL models on any domain using any (graph) neural network Ο as a backbone. The pre-print detailing this framework is TopoTune: A Framework for Generalized Combinatorial Complex Neural Networks. In a GCCN (pictured below), the input complex is represented as an ensemble of strictly augmented Hasse graphs, one per neighborhood of the complex. Each of these Hasse graphs is processed by a sub model Ο, and the outputs are rank-wise aggregated in between layers.
To implement and train a GCCN, run the following command line with the desired choice of dataset, lifting domain (ex: cell
, simplicial
), PyTorch Geometric backbone model (ex: GCN
, GIN
, GAT
, GraphSAGE
) and parameters (ex. model.backbone.GNN.num_layers=2
), neighborhood structure (routes), and other hyperparameters.
python -m topobenchmark \
dataset=graph/PROTEINS \
dataset.split_params.data_seed=1 \
model=cell/topotune\
model.tune_gnn=GCN \
model.backbone.GNN.num_layers=2 \
model.backbone.neighborhoods=\[1-up_laplacian-0,1-down_incidence-2\] \
model.backbone.layers=4 \
model.feature_encoder.out_channels=32 \
model.feature_encoder.proj_dropout=0.3 \
model.readout.readout_name=PropagateSignalDown \
logger.wandb.project=TopoTune_cell \
trainer.max_epochs=1000 \
callbacks.early_stopping.patience=50 \
To use a single augmented Hasse graph expansion, use model={domain}/topotune_onehasse
instead of model={domain}/topotune
.
To specify a set of neighborhoods on the complex, use a list of neighborhoods each specified as a string of the form
r-{neighborhood}-k
, where {neighborhood}
considers. Currently, the following options for {neighborhood}
are supported:
-
up_laplacian
, between cells of rank$k$ through$k+r$ cells. -
down_laplacian
, between cells of rank$k$ through$k-r$ cells. -
hodge_laplacian
, between cells of rank$k$ through both$k-r$ and$k+r$ cells. -
up_adjacency
, between cells of rank$k$ through$k+r$ cells. -
down_adjacency
, between cells of rank$k$ through$k-r$ cells. -
up_incidence
, from rank$k$ to$k+r$ . -
down_incidence
, from rank$k$ to$k-r$ .
The number up_incidence-k
represents the incidence from rank
By default, backbone models are imported from torch_geometric.nn.models
. To import and specify a backbone model from any other package, such as torch.nn.Transformer
or dgl.nn.GATConv
, it is sufficient to 1) make sure the package is installed and 2) specify in the command line:
model.tune_gnn = {backbone_model}
model.backbone.GNN._target_={package}.{backbone_model}
We provide scripts to reproduce experiments on a broad class of GCCNs in scripts/topotune
and reproduce iterations of existing neural networks in scripts/topotune/existing_models
, as previously reported in the TopoTune paper.
We invite users interested in running extensive sweeps on new GCCNs to replicate the --multirun
flag in the scripts. This is a shortcut for running every possible combination of the specified parameters in a single command.
We list the liftings used in TopoBenchmark
to transform datasets. Here, a lifting refers to a function that transforms a dataset defined on a topological domain (e.g., on a graph) into the same dataset but supported on a different topological domain (e.g., on a simplicial complex).
Topology Liftings
Name | Description | Reference |
---|---|---|
CliqueLifting | The algorithm finds the cliques in the graph and creates simplices. Given a clique the first simplex added is the one containing all the nodes of the clique, then the simplices composed of all the possible combinations with one node missing, then two nodes missing, and so on, until all the possible pairs are added. Then the method moves to the next clique. | Simplicial Complexes |
KHopLifting | For each node in the graph, take the set of its neighbors, up to k distance, and the node itself. These sets are then treated as simplices. The dimension of each simplex depends on the degree of the nodes. For example, a node with d neighbors forms a d-simplex. | Neighborhood Complexes |
Name | Description | Reference |
---|---|---|
CellCycleLifting | To lift a graph to a cell complex (CC) we proceed as follows. First, we identify a finite set of cycles (closed loops) within the graph. Second, each identified cycle in the graph is associated to a 2-cell, such that the boundary of the 2-cell is the cycle. The nodes and edges of the cell complex are inherited from the graph. | Appendix B |
Name | Description | Reference |
---|---|---|
KHopLifting | For each node in the graph, the algorithm finds the set of nodes that are at most k connections away from the initial node. This set is then used to create an hyperedge. The process is repeated for all nodes in the graph. | Section 3.4 |
KNearestNeighborsLifting | For each node in the graph, the method finds the k nearest nodes by using the Euclidean distance between the vectors of features. The set of k nodes found is considered as an hyperedge. The proces is repeated for all nodes in the graph. | Section 3.1 |
Feature Liftings
Name | Description | Supported Domains |
---|---|---|
ProjectionSum | Projects r-cell features of a graph to r+1-cell structures utilizing incidence matrices (B_{r}). | Simplicial, Cell |
ConcatenationLifting | Concatenate r-cell features to obtain r+1-cell features. | Simplicial |
Data Transformations
Transform | Description | Reference |
---|---|---|
Message Passing Homophily | Higher-order homophily measure for hypergraphs | Source |
Group Homophily | Higher-order homophily measure for hypergraphs that considers groups of predefined sizes | Source |
Dataset | Task | Description | Reference |
---|---|---|---|
Cora | Classification | Cocitation dataset. | Source |
Citeseer | Classification | Cocitation dataset. | Source |
Pubmed | Classification | Cocitation dataset. | Source |
MUTAG | Classification | Graph-level classification. | Source |
PROTEINS | Classification | Graph-level classification. | Source |
NCI1 | Classification | Graph-level classification. | Source |
NCI109 | Classification | Graph-level classification. | Source |
IMDB-BIN | Classification | Graph-level classification. | Source |
IMDB-MUL | Classification | Graph-level classification. | Source |
Classification | Graph-level classification. | Source | |
Amazon | Classification | Heterophilic dataset. | Source |
Minesweeper | Classification | Heterophilic dataset. | Source |
Empire | Classification | Heterophilic dataset. | Source |
Tolokers | Classification | Heterophilic dataset. | Source |
US-county-demos | Regression | In turn each node attribute is used as the target label. | Source |
ZINC | Regression | Graph-level regression. | Source |
Dataset | Task | Description | Reference |
---|---|---|---|
Cora-Cocitation | Classification | Cocitation dataset. | Source |
Citeseer-Cocitation | Classification | Cocitation dataset. | Source |
PubMed-Cocitation | Classification | Cocitation dataset. | Source |
Cora-Coauthorship | Classification | Cocitation dataset. | Source |
DBLP-Coauthorship | Classification | Cocitation dataset. | Source |
To learn more about TopoBenchmark
, we invite you to read the paper:
@article{telyatnikov2024topobenchmark,
title={TopoBenchmark: A Framework for Benchmarking Topological Deep Learning},
author={Lev Telyatnikov and Guillermo Bernardez and Marco Montagna and Pavlo Vasylenko and Ghada Zamzmi and Mustafa Hajij and Michael T Schaub and Nina Miolane and Simone Scardapane and Theodore Papamarkou},
year={2024},
eprint={2406.06642},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2406.06642},
}
If you find TopoBenchmark
useful, we would appreciate if you cite us!
Hierarchy of configuration files
βββ configs <- Hydra configs
β βββ callbacks <- Callbacks configs
β βββ dataset <- Dataset configs
β β βββ graph <- Graph dataset configs
β β βββ hypergraph <- Hypergraph dataset configs
β β βββ simplicial <- Simplicial dataset configs
β βββ debug <- Debugging configs
β βββ evaluator <- Evaluator configs
β βββ experiment <- Experiment configs
β βββ extras <- Extra utilities configs
β βββ hparams_search <- Hyperparameter search configs
β βββ hydra <- Hydra configs
β βββ local <- Local configs
β βββ logger <- Logger configs
β βββ loss <- Loss function configs
β βββ model <- Model configs
β β βββ cell <- Cell model configs
β β βββ graph <- Graph model configs
β β βββ hypergraph <- Hypergraph model configs
β β βββ simplicial <- Simplicial model configs
β βββ optimizer <- Optimizer configs
β βββ paths <- Project paths configs
β βββ scheduler <- Scheduler configs
β βββ trainer <- Trainer configs
β βββ transforms <- Data transformation configs
β β βββ data_manipulations <- Data manipulation transforms
β β βββ dataset_defaults <- Default dataset transforms
β β βββ feature_liftings <- Feature lifting transforms
β β βββ liftings <- Lifting transforms
β β βββ graph2cell <- Graph to cell lifting transforms
β β βββ graph2hypergraph <- Graph to hypergraph lifting transforms
β β βββ graph2simplicial <- Graph to simplicial lifting transforms
β β βββ graph2cell_default.yaml <- Default graph to cell lifting config
β β βββ graph2hypergraph_default.yaml <- Default graph to hypergraph lifting config
β β βββ graph2simplicial_default.yaml <- Default graph to simplicial lifting config
β β βββ no_lifting.yaml <- No lifting config
β β βββ custom_example.yaml <- Custom example transform config
β β βββ no_transform.yaml <- No transform config
β βββ wandb_sweep <- Weights & Biases sweep configs
β β
β βββ __init__.py <- Init file for configs module
β βββ run.yaml <- Main config for training
More information regarding Topological Deep Learning
Topological Graph Signal Compression
Architectures of Topological Deep Learning: A Survey on Topological Neural Networks
TopoX: a suite of Python packages for machine learning on topological domains