Graph Convolutional Networks From Scratch

My attempt at building the gears of a graph convolutional network (GCN) from the ground up.

Through the process, Jonathan Hui's blog post and Inneke Mayachita's explainer were the gold-standards from which I learned.

This is not meant to be a zero-to-one primer. This is moreso my notes on GCNs, condensed into an explainer for my own comprehension (Feynman Technique).

Components of a minimal(ish) GCN

Just like the analogous convolutional networks, there are a few core classes and operations.

1. Adjacency Matrix

Graph networks require a graph structure as input. Adjacency matrices are convenient ways to represent graph information. Cells with a '1' represent connections between nodes, '0' means there's no connection. In the image, Node 0 shares an edge with Node 1, 2, and 3.

Edge indices are also convenient representations of matrices. Edge indices are a list of node-node connections, e.g. [[1,0], [2,0], [2,3]] is a graph where Node 0 shares an edge with Node 1 and 2, and Node 2 and 3 also share an edge.

Edge indices are often more efficient than a full adjacency matrix (which grows at a complexity of N^2, where N is the number of nodes).

However, in this case I chose to use the matrix representation because there were only 34 nodes in the graph, and it simplified much of the calculations.

2. Feature Matrix

Just as NLP involves tokens with word embeddings, each node in a graph network has its own feature vector. (So does each edge, but we'll ignore edges for now).

The feature matrix is simply an matrix of every node feature vector, and as a result it is an N x feature_dim matrix, where N is the number of nodes and feature_dim is the number of features of your embedding vector.

This carries information about each node. More on this in a bit.

3. Self-Loops

Naively, you might could multiply the adjacency matrix (A) and feature matrix (X) to somehow combine information from the graph connections and the information present in each node.

What happens when you perform the matmul? Well...

AX turns out to sum all the neighboring feature vectors for each node! e.g. AX[0] = 1 + 2 + 3. This is pretty good if our goal is to aggregate information from the graph. But there's a problem. If we go back to the adjacency matrix, you'll notice that all the diagonals are 0s.

For each node, AX doesn't take into account the features of node itself. We can resolve this with self-loops, essentially making it so that each node connect to itself.

In the adjacency matrix, we can simply add 1s along the diagonal by performing a simple sum: A' = A + I. The resulting matrix and product is:

You can see that now AX takes into account information of each node itself when performing the sum.

4. Normalized Adjacency Matrix

There's another more subtle problem with the current implementation. If you look at the product of AX, they aren't normalized. What does this mean? Well, if we keep applying the same operation to the graph, we'll encounter vanishing/exploding gradients. Similar to regular neural networks, we'll need to normalize to prevent numerical instabilities.

We can normalize our GCN by finding the Degree Matrix (D) and multiplying the inverse of D with our adjacency matrix (A).

Comparing against the same mat mul as before, you can see that now the product is normalized.

Without diving into too much of the math (you can do that here), Kipf and Welling showed that the ideal normalization actually takes on a symmetric form.

Now the output is:

That's all for normalization.

5. Convolution / Propagation

Putting everything together, performing the single layer of convolution across the entire graph takes the form of:

Where D is the degree matrix, A is the adjacency matrix with self-loops, H^l is feature matrix at layer l (H^0 is just X, the initial feature matrix), and W^l is the weight matrix of the neural network layer that processes the result of DADX.

Results

I trained a model on Zachary's Karate Club, a classic graph of 34 people involved in a karate club. The model was evaluated on a binary classification task, determining whether a club member would join Mr. Hi's club (red) or the Administrator's (blue). Node 0 represents Mr. Hi, Node 34 represents the Administrator, and the connections of a given club member to other people is what determines their loyalty.

The loss was evaluated via cross entropy between the ground truth labels and probabilities for the binary classification.

Before training:

After training:

There's a clear clustering based on each node's learned embeddings as the model trained, which shows that the model effectively learned at this binary classification task.

Conclusion

That's all for now. Thanks for reading if you made it this far! If you want to learn more, I highly recommend this Graph Neural Network Primer in addition to both posts I mentioned above (Jonathan, Inneke)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
ucidata-zachary		ucidata-zachary
.gitignore		.gitignore
README.md		README.md
nn.py		nn.py
train.ipynb		train.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph Convolutional Networks From Scratch

Components of a minimal(ish) GCN

1. Adjacency Matrix

2. Feature Matrix

3. Self-Loops

4. Normalized Adjacency Matrix

5. Convolution / Propagation

Results

Conclusion

About

Releases

Packages

Languages

linjames0/gcn-from-scratch

Folders and files

Latest commit

History

Repository files navigation

Graph Convolutional Networks From Scratch

Components of a minimal(ish) GCN

1. Adjacency Matrix

2. Feature Matrix

3. Self-Loops

4. Normalized Adjacency Matrix

5. Convolution / Propagation

Results

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages