Understanding Understanding Neural Nets Forwards and Backwards

Benjamin S. Knight, March 5th, 2017

My goal in writing this is to, step-by-step, calculation-by-calculation, walk through a single epoch of a simple neural network. Recall that an epoch is defined as one iteration of the model that utilizes the entirety of the avialable training data. While the network itself is relatively simple (only three nodes), our discussion will be anything but. Our first objective is to elaborate in detail a single forward pass of the model. The next task involves a bit of calculus - one pass of backwards propagation and the derivation of a gradient from which we calculate a new array of weights for the network. Later on, we will delve into what exactly we mean by a 'gradient,' and how it is calculated. However, before anything else, we need an example model - so let's create one now.

One Forward Pass Across a 'Simple' Neural Network

Our hypothetical network has two input variables 'X1' and 'X2', one output variable 'Y', and a training data set that is comprised of a single observation [8, -4]. The interior of the network is comprised of two layers - one hidden layer of two nodes followed by a single node output layer. We can unpack this network further by visualizing the weights as grey squares. Lastly, the activation function that preceeds the output 'Y' is a sigmoid function.

Figure 1: An Example Neural Net

When initializing a neural net, the weights are typically initialized to small random values chosen from a zero-mean Gaussian distribution with a standard deviation of about 0.01 (Hinton, 2010, p.9). We start the forward pass by multiplying our initial inputs ('X1' and 'X2') times their respective weights, and then summing the results. Next, we repeat the process for the nodes in the hidden layer - multiplying the output of H1 times its weight, multiplying the output of H2 times its weight, and summing the two products to create the value associated with the output node 'O1'. Our final step is the activation function - in this case, a sigmoid function.

Figure 2: Randomly Assigning Weights and Executing the Forward Pass

What is a Gradient?

The purpose of backwards propagation is to enable us to train the model - typically with some version of gradient descent. To understand how gradient descent is accomplished, it is helpful to know what a gradient is. Below is a visualization of what an error gradient would look like with two variables. In deriving the error gradient, our variables are referred to as 'edges' - the connections between nodes.

Figure 2: A Three Dimensional Visualization of Gradient Descent

Source: Zoran Vrhovski May 29th, 2012

A neural net can be thought of as one, very large differentiable equation with many variables. To better understand any complex system with many inputs, it is often useful to specify a relationship between the outcome variable and a specific variable of interest. Mathmatically, we do this by taking the partial derivative of the equation with respect to the variable of interest. When taking the partial derivative, any component of the expression that is not somehow associated with the variable of interest is effectively set to zero. Note in the figure below how different elements drop out of relevance based on what variable we are taking the partial derivative with respect to.

For those wanting additional detail, I highly recommend this series of video lectures from Kahn Academy.

Calculating the Gradient

Figure 3: Identifying the Error Derivatives from the Neural Net's Edges

Expression 3: Gradient (Edge 1) - The Loss Function Error

Expression 4: Gradient (Edge 2) - The Sigmoid (Activation) Function Error

Expressions 5 & 6: Gradient (Edges 3-4) - The Output Layer Error

Expressions 7-10: Gradient (Edges 5-8) - The Hidden Layer Error

References

Hinton, Geoffrey. [Artificial Intelligence Courses]. (2013, November 4th). The Backpropagation Algorithm. Retrieved from https://www.youtube.com/watch?v=xfPz92B0rv8.
Hinton, Geoffrey. (2010). A Practical Guide to Training Restricted Boltzmann Machines. Retrieved from http://www.csri.utoronto.ca/~hinton/absps/guideTR.pdf.
Karpathy, Andrej. [MachineLearner]. (2016, June 14th). CS231n Lecture 4 - Backpropagation, Neural Networks. Retrieved from https://www.youtube.com/watch?v=QWfmCyLEQ8U.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.sass-cache/633d44dddbf492632ebe92d1e25af05b1ab08404		.sass-cache/633d44dddbf492632ebe92d1e25af05b1ab08404
Images_and_GIFs		Images_and_GIFs
_includes		_includes
_layouts		_layouts
_posts		_posts
_sass		_sass
_site		_site
css		css
js		js
.gitignore		.gitignore
Gemfile		Gemfile
README.md		README.md
_config.yml		_config.yml
feed.xml		feed.xml
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Understanding Understanding Neural Nets Forwards and Backwards

One Forward Pass Across a 'Simple' Neural Network

What is a Gradient?

Calculating the Gradient

References

About

Releases

Packages

Languages

b-knight/Understanding-Neural-Nets-Forwards-and-Backwards

Folders and files

Latest commit

History

Repository files navigation

Understanding Understanding Neural Nets Forwards and Backwards

One Forward Pass Across a 'Simple' Neural Network

What is a Gradient?

Calculating the Gradient

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages