This repository includes implementations of a deep learning model using two different frameworks: Intel Nervana Neon and Google TensorFlow.
These implementations are intended to illustrate the differences in the programming models presented by the two frameworks.
The model solves an old problem from the machine learning community: assign a 28 × 28 pixel grayscale image of a handwritten digit to the correct one of ten classes.
The model is trained and tested on 70,000 images from the MNIST database.
The Neon implementation uses a NeonArgparser
instance to parse command-line arguments:
if __name__ == '__main__':
main(NeonArgparser(__doc__).parse_args())
The TensorFlow implementation uses an ArgumentParser
instance. The
data_dir
argument specifies the location of cached training data (if any).
It then calls our main()
function, providing command-line arguments.
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--data_dir', type=str, default='/tmp/mnist_data',
help='Directory for storing input data')
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
The Neon implementation uses an MNIST
instance to aquire data sets. The
MNIST
instance handles downloading the MNIST database into a local cache.
dataset = MNIST(path=args.data_dir)
The TensorFlow implementation uses an mnist.input_data
instance.
mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
The Neon implementation defines the model as two Affine
layers with
Gaussian
initialization. The first layer has a rectified linear activation,
and the second a Logistic
activation.
The model is instantiated directly with these two layers.
mlp = Model(
layers=[
Affine(nout=100, init=Gaussian(loc=0.0, scale=0.01),
activation=Rectlin()),
Affine(nout=10, init=Gaussian(loc=0.0, scale=0.01),
activation=Logistic(shortcut=True))])
The TensorFlow implementation defines the model as a collection of
placeholder
s,Variable
s initialized usingrandom_normal_initializer
,- matrix multiplication operations, and
- rectified linear activations.
These objects are actually references into a graph representation of the model. This representation expresses the dependencies between the outputs, various intermediate values, inputs, and the matrix operations on them.
x = tf.placeholder(tf.float32, [None, 784])
W1 = tf.Variable(tf.random_normal_initializer()([784, 100]))
b1 = tf.Variable(tf.random_normal_initializer()([100]))
W2 = tf.Variable(tf.random_normal_initializer()([100, 10]))
b2 = tf.Variable(tf.random_normal_initializer()([10]))
y = tf.matmul(tf.nn.relu(tf.matmul(x, W1) + b1), W2) + b2
The Neon implementation defines the optimizer as an instance of GradientDescentMomentum
optimizer = GradientDescentMomentum(
0.1, momentum_coef=0.9, stochastic_round=args.rounding)
The TensorFlow implementation defines the optimizer as a collection of
placeholder
s for the actual and expected outputs, and- operations including
reduce_mean
andsoftmax_cross_entropy_with_logits
(the cost function)
Again, these objects are references into a graph representation of the optimizer.
y_ = tf.placeholder(tf.float32, [None, 10])
train_step = tf.train.MomentumOptimizer(0.1, 0.9).minimize(
tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)))
The Neon implementation fits the model to the training set by passing the
optimizer to its fit()
method. The cost function is specified here using a
GeneralizedCost
layer and
CrossEntropyBinary
function. The number of training epochs is
derived from the command line arguments.
mlp.fit(
dataset.train_iter,
optimizer=optimizer,
num_epochs=args.epochs,
cost=GeneralizedCost(costfunc=CrossEntropyBinary()),
callbacks=callbacks)
The TensorFlow implementation fits the model to the training set by
-
registering a default
session
in the context of which to execute the graph. -
initializing global variables
-
acquiring a batch of training data and
-
running the optimizer with the batch mapped to placeholders in the model.
-
repeating steps 3. and 4.
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for _ in range(4690):
batch_xs, batch_ys = mnist.train.next_batch(128)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
The Neon implementation evaluates the accuracy of the model on the validation set by calling its eval()
method and passing the Misclassification
metric.
error_rate = mlp.eval(dataset.valid_iter, metric=Misclassification())
neon_logger.display('Classification accuracy = %.4f' % (1 - error_rate))
The TensorFlow implementation defines an accuracy measurement as a collection of
placeholder
s for the actual and expected outputs, and- operations including
equal
andreduce_mean
These objects are actually references into a graph representation of the accuracy formula. It evaluates the accuracy by running this graph with the test set mapped to placeholders.
accuracy = tf.reduce_mean(
tf.cast(
tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)),
tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images,
y_: mnist.test.labels}))
To run the Neon implementation, follow instructions for installing Neon. Then, simply enter
(.venv2) :neon-tf-mnist $ python neon_mnist_mlp.py
To run the TensorFlow implementation, follow instructions for installing TensorFlow. Then simply enter
(tensorflow) :neon-tf-mnist $ python tf_mnist_mlp.py