GitHub - geometric-intelligence/TopoBenchmark: TopoBenchmark is a Python library designed to standardize benchmarking and accelerate research in Topological Deep Learning

A Comprehensive Benchmark Suite for Topological Deep Learning

Assess how your model compares against state-of-the-art topological neural networks.

Overview • Get Started • Tutorials • Neural Networks • Liftings • Datasets • References

📌 Overview

TopoBenchmark (TB) is a modular Python library designed to standardize benchmarking and accelerate research in Topological Deep Learning (TDL). In particular, TB allows to train and compare the performances of all sorts of Topological Neural Networks (TNNs) across the different topological domains, where by topological domain we refer to a graph, a simplicial complex, a cellular complex, or a hypergraph. For detailed information, please refer to the TopoBenchmark: A Framework for Benchmarking Topological Deep Learning paper.

The main pipeline trains and evaluates a wide range of state-of-the-art TNNs and Graph Neural Networks (GNNs) (see ⚙️ Neural Networks) on numerous and varied datasets and benchmark tasks (see 📚 Datasets ). Through TopoTune (see 💡 TopoTune), the library provides easy access to training and testing an entire landscape of graph-based TNNs, new or existing, on any topological domain.

Additionally, the library offers the ability to transform, i.e. lift, each dataset from one topological domain to another (see 🚀 Liftings), enabling for the first time an exhaustive inter-domain comparison of TNNs.

🧩 Get Started

Create Environment

If you do not have conda on your machine, please follow their guide to install it.

First, clone the TopoBenchmark repository and set up a conda environment tb with python 3.11.3.

git clone git@github.com:geometric-intelligence/topobenchmark.git
cd TopoBenchmark
conda create -n tb python=3.11.3

Next, check the CUDA version of your machine:

/usr/local/cuda/bin/nvcc --version

and ensure that it matches the CUDA version specified in the env_setup.sh file (CUDA=cu121 by default). If it does not match, update env_setup.sh accordingly by changing both the CUDA and TORCH environment variables to compatible values as specified on this website.

Next, set up the environment with the following command.

source env_setup.sh

This command installs the TopoBenchmark library and its dependencies.

Run Training Pipeline

Next, train the neural networks by running the following command:

python -m topobenchmark

Thanks to hydra implementation, one can easily override the default experiment configuration through the command line. For instance, the model and dataset can be selected as:

python -m topobenchmark model=cell/cwn dataset=graph/MUTAG

Remark: By default, our pipeline identifies the source and destination topological domains, and applies a default lifting between them if required.

The same CLI override mechanism also applies when modifying more finer configurations within a CONFIG GROUP. Please, refer to the official hydradocumentation for further details.

🚲 Experiments Reproducibility

To reproduce Table 1 from the TopoBenchmark: A Framework for Benchmarking Topological Deep Learning paper, please run the following command:

bash scripts/reproduce.sh

Remark: We have additionally provided a public W&B (Weights & Biases) project with logs for the corresponding runs (updated on June 11, 2024).

⚓ Tutorials

Explore our tutorials for further details on how to add new datasets, transforms/liftings, and benchmark tasks.

⚙️ Neural Networks

We list the neural networks trained and evaluated by TopoBenchmark, organized by the topological domain over which they operate: graph, simplicial complex, cellular complex or hypergraph. Many of these neural networks were originally implemented in TopoModelX.

Graphs

Model	Reference
GAT	Graph Attention Networks
GIN	How Powerful are Graph Neural Networks?
GCN	Semi-Supervised Classification with Graph Convolutional Networks
GraphMLP	Graph-MLP: Node Classification without Message Passing in Graph

Simplicial complexes

Model	Reference
SAN	Simplicial Attention Neural Networks
SCCN	Efficient Representation Learning for Higher-Order Data with Simplicial Complexes
SCCNN	Convolutional Learning on Simplicial Complexes
SCN	Simplicial Complex Neural Networks

Cellular complexes

Model	Reference
CAN	Cell Attention Network
CCCN	Inspired by A learning algorithm for computational connected cellular network, implementation adapted from Generalized Simplicial Attention Neural Networks
CXN	Cell Complex Neural Networks
CWN	Weisfeiler and Lehman Go Cellular: CW Networks

Hypergraphs

Model	Reference
AllDeepSet	You are AllSet: A Multiset Function Framework for Hypergraph Neural Networks
AllSetTransformer	You are AllSet: A Multiset Function Framework for Hypergraph Neural Networks
EDGNN	Equivariant Hypergraph Diffusion Neural Operators
UniGNN	UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks
UniGNN2	UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks

Combinatorial complexes

Model	Reference
GCCN	TopoTune: A Framework for Generalized Combinatorial Complex Neural Networks

💡 TopoTune

We include TopoTune, a comprehensive framework for easily defining and training new, general TDL models on any domain using any (graph) neural network ω as a backbone. The pre-print detailing this framework is TopoTune: A Framework for Generalized Combinatorial Complex Neural Networks. In a GCCN (pictured below), the input complex is represented as an ensemble of strictly augmented Hasse graphs, one per neighborhood of the complex. Each of these Hasse graphs is processed by a sub model ω, and the outputs are rank-wise aggregated in between layers.

Defining and training a GCCN

To implement and train a GCCN, run the following command line with the desired choice of dataset, lifting domain (ex: cell, simplicial), PyTorch Geometric backbone model (ex: GCN, GIN, GAT, GraphSAGE) and parameters (ex. model.backbone.GNN.num_layers=2), neighborhood structure (routes), and other hyperparameters.

python -m topobenchmark \
    dataset=graph/PROTEINS \
    dataset.split_params.data_seed=1 \
    model=cell/topotune\
    model.tune_gnn=GCN \
    model.backbone.GNN.num_layers=2 \
    model.backbone.neighborhoods=\[1-up_laplacian-0,1-down_incidence-2\] \
    model.backbone.layers=4 \
    model.feature_encoder.out_channels=32 \
    model.feature_encoder.proj_dropout=0.3 \
    model.readout.readout_name=PropagateSignalDown \
    logger.wandb.project=TopoTune_cell \
    trainer.max_epochs=1000 \
    callbacks.early_stopping.patience=50 \

To use a single augmented Hasse graph expansion, use model={domain}/topotune_onehasse instead of model={domain}/topotune.

To specify a set of neighborhoods on the complex, use a list of neighborhoods each specified as a string of the form r-{neighborhood}-k, where $k$ represents the source cell rank, and $r$ is the number of ranks up or down that the selected {neighborhood} considers. Currently, the following options for {neighborhood} are supported:

up_laplacian, between cells of rank $k$ through $k+r$ cells.
down_laplacian, between cells of rank $k$ through $k-r$ cells.
hodge_laplacian, between cells of rank $k$ through both $k-r$ and $k+r$ cells.
up_adjacency, between cells of rank $k$ through $k+r$ cells.
down_adjacency, between cells of rank $k$ through $k-r$ cells.
up_incidence, from rank $k$ to $k+r$.
down_incidence, from rank $k$ to $k-r$.

The number $r$ can be omitted, in which case $r=1$ by default (e.g. up_incidence-k represents the incidence from rank $k$ to $k+1$).

Using backbone models from any package

By default, backbone models are imported from torch_geometric.nn.models. To import and specify a backbone model from any other package, such as torch.nn.Transformer or dgl.nn.GATConv, it is sufficient to 1) make sure the package is installed and 2) specify in the command line:

model.tune_gnn = {backbone_model}
model.backbone.GNN._target_={package}.{backbone_model}

Reproducing experiments

We provide scripts to reproduce experiments on a broad class of GCCNs in scripts/topotune and reproduce iterations of existing neural networks in scripts/topotune/existing_models, as previously reported in the TopoTune paper.

We invite users interested in running extensive sweeps on new GCCNs to replicate the --multirun flag in the scripts. This is a shortcut for running every possible combination of the specified parameters in a single command.

🚀 Liftings

We list the liftings used in TopoBenchmark to transform datasets. Here, a lifting refers to a function that transforms a dataset defined on a topological domain (e.g., on a graph) into the same dataset but supported on a different topological domain (e.g., on a simplicial complex).

Topology Liftings

Graph2Simplicial

Name	Description	Reference
CliqueLifting	The algorithm finds the cliques in the graph and creates simplices. Given a clique the first simplex added is the one containing all the nodes of the clique, then the simplices composed of all the possible combinations with one node missing, then two nodes missing, and so on, until all the possible pairs are added. Then the method moves to the next clique.	Simplicial Complexes
KHopLifting	For each node in the graph, take the set of its neighbors, up to k distance, and the node itself. These sets are then treated as simplices. The dimension of each simplex depends on the degree of the nodes. For example, a node with d neighbors forms a d-simplex.	Neighborhood Complexes

Graph2Cell

Name	Description	Reference
CellCycleLifting	To lift a graph to a cell complex (CC) we proceed as follows. First, we identify a finite set of cycles (closed loops) within the graph. Second, each identified cycle in the graph is associated to a 2-cell, such that the boundary of the 2-cell is the cycle. The nodes and edges of the cell complex are inherited from the graph.	Appendix B

Graph2Hypergraph

Name	Description	Reference
KHopLifting	For each node in the graph, the algorithm finds the set of nodes that are at most k connections away from the initial node. This set is then used to create an hyperedge. The process is repeated for all nodes in the graph.	Section 3.4
KNearestNeighborsLifting	For each node in the graph, the method finds the k nearest nodes by using the Euclidean distance between the vectors of features. The set of k nodes found is considered as an hyperedge. The proces is repeated for all nodes in the graph.	Section 3.1

Feature Liftings

Name Description Supported Domains

ProjectionSum Projects r-cell features of a graph to r+1-cell structures utilizing incidence matrices (B_{r}). Simplicial, Cell

ConcatenationLifting Concatenate r-cell features to obtain r+1-cell features. Simplicial

Data Transformations

Transform Description Reference

Message Passing Homophily Higher-order homophily measure for hypergraphs Source

Group Homophily Higher-order homophily measure for hypergraphs that considers groups of predefined sizes Source

📚 Datasets

Graphs

Dataset Task Description Reference

Cora Classification Cocitation dataset. Source

Citeseer Classification Cocitation dataset. Source

Pubmed Classification Cocitation dataset. Source

MUTAG Classification Graph-level classification. Source

PROTEINS Classification Graph-level classification. Source

NCI1 Classification Graph-level classification. Source

NCI109 Classification Graph-level classification. Source

IMDB-BIN Classification Graph-level classification. Source

IMDB-MUL Classification Graph-level classification. Source

REDDIT Classification Graph-level classification. Source

Amazon Classification Heterophilic dataset. Source

Minesweeper Classification Heterophilic dataset. Source

Empire Classification Heterophilic dataset. Source

Tolokers Classification Heterophilic dataset. Source

US-county-demos Regression In turn each node attribute is used as the target label. Source

ZINC Regression Graph-level regression. Source

Hypergraphs

Dataset Task Description Reference

Cora-Cocitation Classification Cocitation dataset. Source

Citeseer-Cocitation Classification Cocitation dataset. Source

PubMed-Cocitation Classification Cocitation dataset. Source

Cora-Coauthorship Classification Cocitation dataset. Source

DBLP-Coauthorship Classification Cocitation dataset. Source

🔍 References

To learn more about TopoBenchmark, we invite you to read the paper:

@article{telyatnikov2024topobenchmark, title={TopoBenchmark: A Framework for Benchmarking Topological Deep Learning}, author={Lev Telyatnikov and Guillermo Bernardez and Marco Montagna and Pavlo Vasylenko and Ghada Zamzmi and Mustafa Hajij and Michael T Schaub and Nina Miolane and Simone Scardapane and Theodore Papamarkou}, year={2024}, eprint={2406.06642}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2406.06642}, }

If you find TopoBenchmark useful, we would appreciate if you cite us!

🐭 Additional Details

Hierarchy of configuration files

├── configs <- Hydra configs │ ├── callbacks <- Callbacks configs │ ├── dataset <- Dataset configs │ │ ├── graph <- Graph dataset configs │ │ ├── hypergraph <- Hypergraph dataset configs │ │ └── simplicial <- Simplicial dataset configs │ ├── debug <- Debugging configs │ ├── evaluator <- Evaluator configs │ ├── experiment <- Experiment configs │ ├── extras <- Extra utilities configs │ ├── hparams_search <- Hyperparameter search configs │ ├── hydra <- Hydra configs │ ├── local <- Local configs │ ├── logger <- Logger configs │ ├── loss <- Loss function configs │ ├── model <- Model configs │ │ ├── cell <- Cell model configs │ │ ├── graph <- Graph model configs │ │ ├── hypergraph <- Hypergraph model configs │ │ └── simplicial <- Simplicial model configs │ ├── optimizer <- Optimizer configs │ ├── paths <- Project paths configs │ ├── scheduler <- Scheduler configs │ ├── trainer <- Trainer configs │ ├── transforms <- Data transformation configs │ │ ├── data_manipulations <- Data manipulation transforms │ │ ├── dataset_defaults <- Default dataset transforms │ │ ├── feature_liftings <- Feature lifting transforms │ │ └── liftings <- Lifting transforms │ │ ├── graph2cell <- Graph to cell lifting transforms │ │ ├── graph2hypergraph <- Graph to hypergraph lifting transforms │ │ ├── graph2simplicial <- Graph to simplicial lifting transforms │ │ ├── graph2cell_default.yaml <- Default graph to cell lifting config │ │ ├── graph2hypergraph_default.yaml <- Default graph to hypergraph lifting config │ │ ├── graph2simplicial_default.yaml <- Default graph to simplicial lifting config │ │ ├── no_lifting.yaml <- No lifting config │ │ ├── custom_example.yaml <- Custom example transform config │ │ └── no_transform.yaml <- No transform config │ ├── wandb_sweep <- Weights & Biases sweep configs │ │ │ ├── __init__.py <- Init file for configs module │ └── run.yaml <- Main config for training

More information regarding Topological Deep Learning

Topological Graph Signal Compression

Architectures of Topological Deep Learning: A Survey on Topological Neural Networks

TopoX: a suite of Python packages for machine learning on topological domains

Name		Name	Last commit message	Last commit date
Latest commit History 1,150 Commits
.github		.github
configs		configs
docs		docs
resources		resources
scripts		scripts
test		test
topobenchmark		topobenchmark
tutorials		tutorials
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
codecov.yml		codecov.yml
env_setup.sh		env_setup.sh
format_and_lint.sh		format_and_lint.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Comprehensive Benchmark Suite for Topological Deep Learning

📌 Overview

🧩 Get Started

Create Environment

Run Training Pipeline

🚲 Experiments Reproducibility

⚓ Tutorials

⚙️ Neural Networks

Graphs

Simplicial complexes

Cellular complexes

Hypergraphs

Combinatorial complexes

💡 TopoTune

Defining and training a GCCN

Using backbone models from any package

Reproducing experiments

🚀 Liftings

Graph2Simplicial

Graph2Cell

Graph2Hypergraph

📚 Datasets

Graphs

Hypergraphs

🔍 References

🐭 Additional Details

About

Packages

Contributors 11

Languages

Name	Description	Supported Domains
ProjectionSum	Projects r-cell features of a graph to r+1-cell structures utilizing incidence matrices (B_{r}).	Simplicial, Cell
ConcatenationLifting	Concatenate r-cell features to obtain r+1-cell features.	Simplicial

Transform	Description	Reference
Message Passing Homophily	Higher-order homophily measure for hypergraphs	Source
Group Homophily	Higher-order homophily measure for hypergraphs that considers groups of predefined sizes	Source

Dataset	Task	Description	Reference
Cora	Classification	Cocitation dataset.	Source
Citeseer	Classification	Cocitation dataset.	Source
Pubmed	Classification	Cocitation dataset.	Source
MUTAG	Classification	Graph-level classification.	Source
PROTEINS	Classification	Graph-level classification.	Source
NCI1	Classification	Graph-level classification.	Source
NCI109	Classification	Graph-level classification.	Source
IMDB-BIN	Classification	Graph-level classification.	Source
IMDB-MUL	Classification	Graph-level classification.	Source
REDDIT	Classification	Graph-level classification.	Source
Amazon	Classification	Heterophilic dataset.	Source
Minesweeper	Classification	Heterophilic dataset.	Source
Empire	Classification	Heterophilic dataset.	Source
Tolokers	Classification	Heterophilic dataset.	Source
US-county-demos	Regression	In turn each node attribute is used as the target label.	Source
ZINC	Regression	Graph-level regression.	Source

Dataset	Task	Description	Reference
Cora-Cocitation	Classification	Cocitation dataset.	Source
Citeseer-Cocitation	Classification	Cocitation dataset.	Source
PubMed-Cocitation	Classification	Cocitation dataset.	Source
Cora-Coauthorship	Classification	Cocitation dataset.	Source
DBLP-Coauthorship	Classification	Cocitation dataset.	Source

License

geometric-intelligence/TopoBenchmark

Folders and files

Latest commit

History

Repository files navigation

A Comprehensive Benchmark Suite for Topological Deep Learning

📌 Overview

🧩 Get Started

Create Environment

Run Training Pipeline

🚲 Experiments Reproducibility

⚓ Tutorials

⚙️ Neural Networks

Graphs

Simplicial complexes

Cellular complexes

Hypergraphs

Combinatorial complexes

💡 TopoTune

Defining and training a GCCN

Using backbone models from any package

Reproducing experiments

🚀 Liftings

Graph2Simplicial

Graph2Cell

Graph2Hypergraph

📚 Datasets

Graphs

Hypergraphs

🔍 References

🐭 Additional Details

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages 0

Contributors 11

Languages

Packages