[Community Sprint] Documentation Tutorials 📚 #7892

rusty1s · 2023-08-16T16:07:42Z

We are kicking off another community sprint!

This community sprint resolves around improving our documentation to make PyG more easily accessible and to expose various PyG features more clearly. Each tutorial is categorized into one of three levels of expertise [EASY, MEDIUM, HARD], and should be picked depending on your expertise with PyG.

The sprint begins Thursday August 16th and will last 3 weeks. If you are interested in helping out, please also join our PyG slack channel #documentation-sprint for more information, guidance and help.

You can assign yourself to the tutorial you are planning to work on here (choose the "documentation" tab at the bottom if you get directed to a wrong tab).

Documentation Tutorials 📚

We want to improve and enhance the "Tutorials" section in our documentation. On a high-level, we plan to add various tutorials regarding GNN design, applications and use-cases, dataset handling, sampling and multi-GPU training.

GNN Design

[MEDIUM] Best-Practices on GNN Design: This tutorial should outline common building blocks in GNN modules (e.g., GNN layers, normalization layers, skip-connections (e.g., via JumpingKnowledge), and explain the various options of GNN layers we have in PyG (e.g., homogeneous GNN layers, bipartite GNN layers, GNN layers that expect edge features and edge weights, GNN layers that expect edge_type information, GNN layers designed for point clouds, etc) by cross-referencing to our GNN Cheatsheet.
[EASY] Customizing Aggregations within Message Passing #7901: Most of the tutorial can be directly copied from this blog post. It should introduce our Aggregation package and how you can leverage it to built more powerful aggregations.

Applications

[MEDIUM] Application Overview: This tutorial should introduce the various tasks you can tackle with PyG, including but not limited to node prediction, link prediction and graph classification. It should present the general idea of training pipelines and loss functions for these different tasks (e.g., global pooling in graph classification, link-level decoders in link prediction tasks), and at best should reference examples for this from our examples/ folder.
[MEDIUM] Explainability: This tutorial needs to be extended by information stemming from our blog post. In addition, it should go over benchmark datasets and explainability metrics, and reference to corresponding from our examples/explain folder.
[EASY] Node2Vec/MetaPath2Vec Tutorial: This tutorial should introduce the Node2Vec and MetaPath2Vec methods and their corresponding modules in PyG. It should outline the general training flow of this modules, and how to perform down-stream tasks given the embeddings generated by these modules.
[HARD] Graph Transformer Tutorial: This tutorial should cover the general idea of Graph Transformers (e.g., attention, positional encodings). It should explain the underlying framework of GPSConv module in PyG and how to use it to train Transformer modules on graph-structured data.
[EASY] Point Cloud Classification/Segmentation: This tutorial should explain how we can leverage GNNs to learn on point clouds, and introduce the various layers in PyG suitable for this task. As a reference, take a look at our Google Colab Notebook. It should also explain the training pipelines of classification and segmentation tasks and reference their corresponding examples in PyG.

Datasets

[EASY] Dataset Splitting: This tutorial should cover the basics of how you can split your dataset into training, validation and test sets across the three tasks of node prediction, link prediction and graph datasets. It should introduce both RandomNodeSplit and RandomLinkSplit transformations, but also cover how you can create custom splits outside of randomly generated ones.

Sampling

[MEDIUM] Available Sampling Techniques in PyG: This tutorial should explain the basic concepts of mini-batch sampling for learning on large-scale graphs. It should cover the different options in PyG to do this, e.g., NeighborLoader, ClusterLoader, GraphSAINT, ShaDowKHop, explain their strengths and weaknesses, and which sampler/loader to pick for which task (and link to their example if available).
[MEDIUM] Neighbor Sampling: This tutorial should go more in-depth into our NeighborLoader, explain its usage and reference corresponding examples. It should outline the general computation flow of GNNs with neighborhood sampling, and things to look out for (e.g., ensuring to only make use of the first batch_size many nodes for loss/metric computation). It should also cross-link to our "Hierarchical Neighborhood Sampling" tutorial as a simple extension to improve its efficiency.
[HARD] Link-level Neighbor Sampling: This tutorial should go more in-depth on how you can perform mini-batching for link prediction tasks on large-scale graphs. It should cover the basics of LinkNeighborLoader and how it works under the hood, explain the differences between edge_index and edge_label_index, and cover basic training pipelines. In addition, we can showcase how to leverage KNNIndex to perform fast-querying of nearest neighbors during inference, based on the embeddings obtained from the trained GNN.

Multi-GPU Training

[EASY] Multi-GPU training in Vanilla PyTorch Tutorial #7893: This tutorial should cover the basic of how we can leverage torch.nn.DistributedDataParallel for multi-GPU training in PyG. It should briefly go over the corresponding examples in PyG for distributed batching and distributed sampling.
[MEDIUM] PyTorch Lightning: This tutorial should explain how one can leverage PyTorch Lightning within PyG for multi-GPU training. It should go over our PyTorch Lightning Wrappers in PyG to easily convert PyG datasets into a LightningDataModule instance, and go over and reference our PyTorch Lightning examples.
[MEDIUM] cugraph and cugraph-ops: (@pyg-team/nvidia-team) This tutorial should introduce and explain the usage of CuGraphConv modules in PyG. It would be great if more information can be shared on what makes these layers more efficient than their PyG counterpart. This tutorial should also capture how one can use them for multi-GPU training within cugraph.
[HARD] torch_geometric.distributed: (@pyg-team/intel-team) This tutorial should explain the usage and internals of our torch_geometric.distributed package (still WIP). More information will be added once it is ready.
[HARD] GraphLearn for PyTorch (GLT): This tutorial should cover how one can leverage GraphLearn for PyTorch for multi-GPU training within PyG. It should shed some lights on the internals and explain how to use it, similar to what is already present in the README.

Guide to Contributing

Ensure you have read our contributing guidelines and our tutorial for building the documentation.
Each tutorial sits inside its own *.rst file in the docs/source/tutorial/ folder. You can browse other files in this folder to get a sense for how tutorials are written and formatted.
Open a PR to the PyG repository and name it: "[Documentation] {tutorial_name}". Afterwards, create an respective entry in CHANGELOG.md to document your change/feature.
Our CI will build the documentation on every push to your PR. Once built, you can inspect it by clicking the docs/readthedocs.org:pytorch-geometric tab in the status field of your PR. As such, you do not necessarily need to build the documentation locally to see your respective changes.

The text was updated successfully, but these errors were encountered:

MuhammadIrtiza17 · 2023-08-17T00:50:05Z

Hello Sir the link that you have added in "Guide To Contribution" of" contribution guideline " in first point is not working it is redirecting to an .md file that has not been generated so can you please resolve that issue? I was unable to view the guidelines.
after going through the repository I found that someone moved it to .github/dir so kindly correct the link.

rusty1s · 2023-08-17T03:09:33Z

Hello Sir the link that you have added in "Guide To Contribution" of" contribution guideline " in first point is not working it is redirecting to an .md file that has not been generated so can you please resolve that issue?

Fixed :)

Part of #7892 documentation sprint. I have added the tutorial to the documentation and started writing a bit. Will continue to fill out! 💪 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: rusty1s <matthias.fey@tu-dortmund.de>

#7892 Neighbor Sampling: This tutorial should go more in-depth into our [NeighborLoader](https://pytorch-geometric.readthedocs.io/en/latest/modules/loader.html#torch_geometric.loader.NeighborLoader), explain its usage and reference corresponding examples. It should outline the general computation flow of GNNs with neighborhood sampling, and things to look out for (e.g., ensuring to only make use of the first batch_size many nodes for loss/metric computation). It should also cross-link to our ["Hierarchical Neighborhood Sampling"](https://pytorch-geometric.readthedocs.io/en/latest/advanced/hgam.html) tutorial as a simple extension to improve its efficiency. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>

Part of #7892 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Rishi Puri <puririshi98@berkeley.edu>

Part of #7892 Fixed #10004 --------- Co-authored-by: Rishi Puri <puririshi98@berkeley.edu>

rusty1s added the documentation label Aug 16, 2023

rusty1s self-assigned this Aug 16, 2023

rusty1s added 0 - Priority P0 roadmap feature help wanted labels Aug 16, 2023

akihironitta mentioned this issue Aug 24, 2023

[Documentation] NeighborLoader Tutorial #7931

Merged

vstenby mentioned this issue Aug 27, 2023

[Documentation] Node2Vec and MetaPath2Vec #7938

Merged

husimplicity mentioned this issue Sep 6, 2023

[Documentation] Add the GraphLearn-for-Pytorch Tutorial #7989

Open

xnuohz mentioned this issue Oct 6, 2023

[Documentation] Graph Transformer Tutorial #8144

Merged

SimonPop mentioned this issue Oct 10, 2023

[Documentation] Application Overview Tutorial #8174

Open

xnuohz mentioned this issue Nov 11, 2023

[Documentation] Data Splitting #8366

Merged

puririshi98 added a commit that referenced this issue Feb 10, 2025

[Documentation] Data Splitting (#8366)

0d142bb

Part of #7892 Fixed #10004 --------- Co-authored-by: Rishi Puri <puririshi98@berkeley.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Community Sprint] Documentation Tutorials 📚 #7892

[Community Sprint] Documentation Tutorials 📚 #7892

rusty1s commented Aug 16, 2023 •

edited

Loading

MuhammadIrtiza17 commented Aug 17, 2023 •

edited

Loading

rusty1s commented Aug 17, 2023

[Community Sprint] Documentation Tutorials 📚 #7892

[Community Sprint] Documentation Tutorials 📚 #7892

Comments

rusty1s commented Aug 16, 2023 • edited Loading

Documentation Tutorials 📚

GNN Design

Applications

Datasets

Sampling

Multi-GPU Training

Guide to Contributing

MuhammadIrtiza17 commented Aug 17, 2023 • edited Loading

rusty1s commented Aug 17, 2023

rusty1s commented Aug 16, 2023 •

edited

Loading

MuhammadIrtiza17 commented Aug 17, 2023 •

edited

Loading