The basic unit of work in a neural network is the Artificial Neuron
. An Artificial Neuron
has an associated potential to emit a signal. For convenience the value of the potential is kept between and
. If the potential
the neuron is active, if
the neuron is inactive. We can implement the
Artificial Neuron
as a function with an array of
activation values
i.e.
in it's internal scope. The function parameters are an array of
weight values
or
. The output then is the signal
. The values
and
are defined as tensors because the types of operations or functions that will be used to manipulate the
Artificial Neurons
comes from a branch of mathematics called Tensor Analysis. Consider the implementation of based on the following:
- We define tensors
and
.
- Multiply tensors
and
i.e.
.
- The tensor product will be
.
- Reduce
to a scalar value by adding it's components.
- The sum of the
components is
.
is called a weighted sum and is represented by
where
is the number of elements in
.
determines the strength of the signal emitted by the
Artificial Neuron
.- Capping
adds additional control over signal emission and is done by subtracting a bias
from the sum.
- It is possible for
to have a value outside the desire signal strength
. For this reason an
activation function
is used to bringinto the desired range.
- One of the commonly used
activation functions
is thesigmoid
.
In conclusion the implementation of an Artificial Neuron is the function .
A neural network
is a computational graph of Artificial Neurons
. Neural networks
are composed of neural network layers
. A neural network layer
is a tensor of Artificial Neurons
. The Artificial Neurons
in a neural network layer
are connected to each other because they are components of a tensor. We can define layer n as . Neural networks have three
layer types input, hidden, and output
. A neural network may have multiple hidden layers but only one input and output layers. Consider a neural network consisting of the fallowing layers:
Neural network themselves are tensors. In this case neural network .
Artificial Neurons
in a neural network are associated to each other via function composition
. Consider it has an internal tensor of
activation values
. The number of components in
is one. The output of
is a potential
. The key question one must ask at this point is, how are the number of
activation values
in associated to the number of
activation values
in ? Here is where the magic happens
becomes the input weight for
and
. This means that
becomes
a weight value tensor and the input for
and
. This means that the activation values tensor for
is
a tensor with one component because the input layer consist of only one component
. Is important to notice that
the number of activation values in a layer's Artificial Neurons are determined by the number of Artificial Neurons in the previous layer
.
For completeness let's consider the output layer Artificial Neuron
. Based on our current understanding
has an internal tensor of
activation values
because
has two components
and
. The output for
is a potential
and the output for
is a potential
therefore the weight value tensor is
. The weighted sum for
is
and its potential is
. Notice how all Artificial Neurons in every layer of the neural network are relaying information i.e. emitting a signal directly or indirectly to each other in a forward direction. The type of neural network where all Artificial Neurons are connected to each other is called a dense neural network.
We define a neural network algorithm
as a function that produces an
output
in response to an
input
and n number of hidden layers
i.e.
. A neural network is a system defined by the following tensor
.
In our daily experience we go through time and we have a
state
at each moment in time. Our reality is a series of moments in time. At each moment we can assess our state
and map any number of metrics to an exact moment in time and persist the resulting information representing our state
. Our memories are our state
and we derive knowledge from them. Compare to you or me is a very simple system, a moment of time for
is represented by evaluating
and
at a given value of
. We bring
to life by feeding it
input
and evaluating the output
of every Artificial Neuron
in each neural network layer
.
Back propagation is the most widely used machine learning algorithm. The algorithm's objective is to find the optimal values for that will yield expected outputs
in
through a training process. The algorithm's steps are:
- Initialize Artificial Neurons
in
by assigning random
to every
and
in the range
.
- Iterate over the training dataset.
- For each item in the dataset
forward propagate
by invoking the activation function on Artificial Neuronfrom
(all hidden layers) to
(the output layer) using the input value for each item in the data set as the input. The signals of Artificial Neurons in a previous layer became the input for
for the current layer.
Backwards propagate the error
by iterating over the layers in reverse order and calculating the error between the current outputand the expected output
the output for the corresponding input in the dataset. One of the most commonly used error, cost, or loss functions to compare
vs.
is the Mean Squared Error function
. The error indicates how close the signal
is to
.
- Compute the rate of change in the cost function. The rate of change of a single variable function with one scalar output is called a derivative i.e.
. The rate of change of a multi variable function with one scalar output is called the
gradient
i.e. The
gradient
indicates the direction and magnitude of greatest increase for the error function. In this case theneeds to be computed since we are dealing with multi variable tensors.
needs to be negative because the objective is to advance towards lower error or cost i.e.
. Define
learning rate
a number
, used as a factor that determines the magnitude of
in conjunction with
. Define
momentum
a number between
, used as a factor that determines the magnitude of
in conjunction with
. The magnitude of
will determine how big of a step we take in our search to minimize the error or cost
. The magnitude of
will determine how much of an influence the previous values of
have in our search to minimize the error or cost
. Compute the scalar values
by which
needs change in order to decrease error i.e. bring
closer
. Follow the same procedure to fine tune the bias
.
- After iterating over the complete training data set verify that the current error is less or equal to the
error threshold
or that the
maximum number of iterations
was reached, if true stop training else continue. Each complete iteration over all items in a training set is called an
epoch
.
Is crucial to understand that and associated biases change as a result of
back propagation
while changes as a result of
forward propagation
. This means that properly labeled data is essential for training and how well performs. When practicing machine learning you will be presented with the opportunity to adjust so called hyper parameters some of them are:
The process of preparing training data sets is challenging. The key to the process is proper vectorization and labeling of training data. Neural networks can be applied to all kind of problems involving regression, classification, or prediction. The way data is prepared for training requires careful consideration of the domain and the goals one intents to achieve.
Imagine we have a set of data representing the horse power , and the miles per gallon
of a model
. The array
represents an element in our raw dataset. Our objective is to determine if there is a relationship between
and
and to design a neural network
that will help us predict the
given
. To prepare the data for consumption we need understand what are the inputs and outputs for our model. Since our intent is to predict
in relationship to
regardless of the model, then our training data becomes
. The last step in the process is data normalization, and is usually accomplished by min-max feature scaling. The function for
min-max feature scaling
is where
is any value,
is the maximum, and
is the minimum in the array
. Normalization assures that the value
is always within the range
. In our case study
and the expected output
. Normalization is necessary because it brings any data set to the necessary range
.
Neural networks are computational graphs used to universally model functions. The majority of relationships represented by functions are not linear, for this reason logistic functions like , or
are used to modulate signals, they introduce nonlinearity to the Artificial Neuron model which increases the scope of problems we can solve. Normalization helps by keeping everything at the same scale and allows the system to be more sensitive when recognizing patterns. When a neural network is trained it becomes a function in tensor form specific to the training domain. After training, the acquired knowledge can be preserved by serializing
, associated biases, and all the hyper parameters used during training. The resulting kernel of knowledge is very tiny in comparison to the training data and could be used almost anywhere including a web browser. When utilizing neural networks for regression, prediction, or classification the activation values come form the input provided and the wights are not changed. The activation values flow forward from the input layer to the output layer.
In the examples directory you can find several examples using the Brain.JS framework to create and train neural networks.