The implementation of rbm, which is an improvement of Boltzmann Machine. RBM is used in dimensionality reduction, collaborative filtering, etc.
RBM has one visible layer (v) and one hidden layer (h). We can calculate h from v. Otherwise, we also can calculate v from h.
Both sides only have values of 0 or 1 (boolean values).
Notation | Description |
---|---|
v (NxD) | the visible layer |
W (DxM) | kernel |
h (NxM) | the hidden layer |
b (Dx1), c (Mx1) | biases |
F | energy function |
N | the number of observations |
D | the number of features |
M | the number of hidden units |
p(v| h) | the probability of v given h (is a vector of probabilities) |
p(h| v) | the probability of h given v (is a vector of probabilities) |
The activation functions of p(v|h) and p(h|v) are sigmoid.
Rather than using cross-entropy, the authors use another kind of loss function denoted by L. It is observed that minimizing L also means that minimizing the cross-entropy.
We try to minimize the following loss function:
L = F(v) - F(v')
v' is a sample of (v, h). We generate v' by performing Gibbs sampling with one step. More than one steps are good, but it is not necessary since one step is good enough.
The formula of the energy function F is as follows:
The experiment is performed on PyCharm 2018.3.4, python 3, mac osx.
The left images are the original ones. The right images ars the reconstructed images by using RBM.
Example 1 | Example 2 | Example 3 |
---|---|---|
It is similar to stacked autoencoder.
Stacked RBM = continuous layer of RBM
I test with three RBM on digit-recognizer dataset. The first hidden layer has 64 units. The second has 32 units. The last hidden layer has 16 unit. The number of classes on output layer is 10 classes.
Each hidden layer is trained in turn from the first hidden layer to the last hidden layer. In each training phase, epoch = 100, learning_rate = 0.001.
Kaggle score: 0.95757% on the dataset digit-recognizer.