此项目,将对CIFAR-10 数据集 中的图片进行分类。 该数据集包含飞机、猫狗和其他物体。导入数据后,需要预处理这些图片,然后用所有样本训练一个卷积神经网络。图片需要标准化(normalized),标签需要采用 one-hot 编码。
运行以下单元,以下载 CIFAR-10 数据集(Python版)。
cifar10_dataset_folder_path = 'cifar-10-batches-py'
# Use Floyd's cifar-10 dataset if present
floyd_cifar10_location = '/input/cifar-10/python.tar.gz'
if isfile(floyd_cifar10_location):
tar_gz_path = floyd_cifar10_location
tar_gz_path = 'cifar-10-python.tar.gz'
class DLProgress(tqdm):
last_block = 0
def hook(self, block_num=1, block_size=1, total_size=None):
self.total = total_size
self.update((block_num - self.last_block) * block_size)
self.last_block = block_num
if not isfile(tar_gz_path):
with DLProgress(unit='B', unit_scale=True, miniters=1, desc='CIFAR-10 Dataset') as pbar:
if not isdir(cifar10_dataset_folder_path):
with tarfile.open(tar_gz_path) as tar:
All files found!
该数据集分成了几部分/批次(batches),以免机器在计算时内存不足。CIFAR-10 数据集包含 5 个部分,名称分别为 data_batch_1
- 飞机
- 汽车
- 鸟类
- 猫
- 鹿
- 狗
- 青蛙
- 马
- 船只
- 卡车
了解数据集也是对数据进行预测的必经步骤。可以通过更改 batch_id
和 sample_id
是数据集一个部分的 ID(1 到 5)。sample_id
是该部分中图片和标签对(label pair)的 ID。
import helper
import numpy as np
# Explore the dataset
batch_id = 1
sample_id = 5
helper.display_stats(cifar10_dataset_folder_path, batch_id, sample_id)
Stats of batch 1:
Samples: 10000
Label Counts: {0: 1005, 1: 974, 2: 1032, 3: 1016, 4: 999, 5: 937, 6: 1030, 7: 1001, 8: 1025, 9: 981}
First 20 Labels: [6, 9, 9, 4, 1, 1, 2, 7, 8, 3, 4, 7, 7, 2, 9, 9, 9, 3, 2, 6]
Example of Image 5:
Image - Min Value: 0 Max Value: 252
Image - Shape: (32, 32, 3)
Label - Label Id: 1 Name: automobile
在下面的单元中,实现 normalize
函数,传入图片数据 x
,并返回标准化 Numpy 数组。值应该在 0 到 1 的范围内(含 0 和 1)。返回对象应该和 x
def normalize(x):
Normalize a list of sample image data in the range of 0 to 1
: x: List of image data. The image shape is (32, 32, 3)
: return: Numpy array of normalize data
# TODO: Implement Function
xmax,xmin = x.max(),x.min()
return (x - xmin)/(xmax - xmin)
Tests Passed
实现 one_hot_encode
函数, x
,是一个标签列表。实现该函数,以返回为 one_hot 编码的 Numpy 数组的标签列表。标签的可能值为 0 到 9。每次调用 one_hot_encode
时,对于每个值,one_hot 编码函数应该返回相同的编码。确保将编码映射保存到该函数外面。
def one_hot_encode(x):
One hot encode a list of sample labels. Return a one-hot encoded vector for each label.
: x: List of sample Labels
: return: Numpy array of one-hot encoded labels
# TODO: Implement Function
return np.eye(10)[x]
Tests Passed
运行下方的代码单元,将预处理所有 CIFAR-10 数据,并保存到文件中。下面的代码还使用了 10% 的训练数据,用来验证。
# Preprocess Training, Validation, and Testing Data
helper.preprocess_and_save_data(cifar10_dataset_folder_path, normalize, one_hot_encode)
import problem_unittests as tests
import helper
import helper
# Load the Preprocessed Validation data
valid_features, valid_labels = pickle.load(open('preprocess_validation.p', mode='rb'))
神经网络需要读取图片数据、one-hot 编码标签和丢弃保留概率(dropout keep probability)。需要实现的函数:
- 实现
- 返回 TF Placeholder
- 使用
- 使用 TF Placeholder 中的 TensorFlow
参数对 TensorFlow 占位符 "x" 命名 - 实现
- 返回 TF Placeholder
- 使用
- 使用 TF Placeholder 中的 TensorFlow
参数对 TensorFlow 占位符 "y" 命名 - 实现
- 返回 TF Placeholder,用于丢弃保留概率
- 使用 TF Placeholder 中的 TensorFlow
参数对 TensorFlow 占位符 "keep_prob" 命名
注意:TensorFlow 中的 None
def neural_net_image_input(image_shape):
Return a Tensor for a batch of image input
: image_shape: Shape of the images
: return: Tensor for image input.
# TODO: Implement Function
x = tf.placeholder(tf.float32, [None, image_shape[0], image_shape[1], image_shape[2]],name='x')
return x
def neural_net_label_input(n_classes):
Return a Tensor for a batch of label input
: n_classes: Number of classes
: return: Tensor for label input.
# TODO: Implement Function
y = tf.placeholder(tf.float32, [None, n_classes],name='y')
return y
def neural_net_keep_prob_input():
Return a Tensor for keep probability
: return: Tensor for keep probability.
# TODO: Implement Function
keep_prob = tf.placeholder(tf.float32,name='keep_prob')
return keep_prob
Image Input Tests Passed.
Label Input Tests Passed.
Keep Prob Tests Passed.
卷积层级适合处理图片。对于此代码单元,应该实现函数 conv2d_maxpool
- 使用
的形状创建权重(weight)和偏置(bias)。 - 使用权重和
应用卷积。 - 建议使用我们建议的间距(padding),当然也可以使用任何其他间距。
- 添加偏置
- 向卷积中添加非线性激活(nonlinear activation)
- 使用
def conv2d_maxpool(x_tensor, conv_num_outputs, conv_ksize, conv_strides, pool_ksize, pool_strides):
Apply convolution then max pooling to x_tensor
:param x_tensor: TensorFlow Tensor
:param conv_num_outputs: Number of outputs for the convolutional layer
:param conv_ksize: kernal size 2-D Tuple for the convolutional layer
:param conv_strides: Stride 2-D Tuple for convolution
:param pool_ksize: kernal size 2-D Tuple for pool
:param pool_strides: Stride 2-D Tuple for pool
: return: A tensor that represents convolution and max pooling of x_tensor
# TODO: Implement Function
weight = tf.Variable(tf.random_normal([conv_ksize[0], conv_ksize[1], x_tensor.get_shape().as_list()[-1], conv_num_outputs], stddev=5e-2))
bias = tf.Variable(tf.zeros(conv_num_outputs))
conv_layer = tf.nn.conv2d(x_tensor, weight, strides=[1, conv_strides[0], conv_strides[1], 1], padding='SAME')
conv_layer = tf.nn.bias_add(conv_layer, bias)
conv_layer = tf.nn.relu(conv_layer)
conv_layer = tf.nn.max_pool(conv_layer, ksize=[1, pool_ksize[0], pool_ksize[1], 1], strides=[1,pool_strides[0],pool_strides[1],1], padding='SAME')
return conv_layer
Tests Passed
实现 flatten
函数,将 x_tensor
的维度从四维张量(4-D tensor)变成二维张量。输出应该是形状(部分大小(Batch Size),扁平化图片大小(Flattened Image Size))。
def flatten(x_tensor):
Flatten x_tensor to (Batch Size, Flattened Image Size)
: x_tensor: A tensor of size (Batch Size, ...), where ... are the image dimensions.
: return: A tensor of size (Batch Size, Flattened Image Size).
# TODO: Implement Function
w1 = x_tensor.get_shape().as_list()[1]
h1 = x_tensor.get_shape().as_list()[2]
d1 = x_tensor.get_shape().as_list()[3]
x_flat = tf.reshape(x_tensor,[-1, w1 * h1 *d1])
return x_flat
Tests Passed
实现 fully_conn
函数,以向 x_tensor
应用完全连接的层级,形状为(部分大小(Batch Size),num_outputs)。
def fully_conn(x_tensor, num_outputs):
Apply a fully connected layer to x_tensor using weight and bias
: x_tensor: A 2-D tensor where the first dimension is batch size.
: num_outputs: The number of output that the new tensor should be.
: return: A 2-D tensor where the second dimension is num_outputs.
# TODO: Implement Function
return tf.contrib.layers.fully_connected(x_tensor, num_outputs)
Tests Passed
实现 output
函数,向 x_tensor 应用完全连接的层级,形状为(部分大小(Batch Size),num_outputs)。
注意:该层级不应应用 Activation、softmax 或交叉熵(cross entropy)。
def output(x_tensor, num_outputs):
Apply a output layer to x_tensor using weight and bias
: x_tensor: A 2-D tensor where the first dimension is batch size.
: num_outputs: The number of output that the new tensor should be.
: return: A 2-D tensor where the second dimension is num_outputs.
# TODO: Implement Function
return tf.contrib.layers.legacy_fully_connected(x_tensor, num_outputs)
Tests Passed
实现函数 conv_net
, 创建卷积神经网络模型。该函数传入一批图片 x
- 应用 1、2 或 3 个卷积和最大池化层(Convolution and Max Pool layers)
- 应用一个扁平层(Flatten Layer)
- 应用 1、2 或 3 个完全连接层(Fully Connected Layers)
- 应用一个输出层(Output Layer)
- 返回输出
- 使用
向模型中的一个或多个层应用 TensorFlow 的 Dropout
def conv_net(x, keep_prob):
Create a convolutional neural network model
: x: Placeholder tensor that holds image data.
: keep_prob: Placeholder tensor that hold dropout keep probability.
: return: Tensor that represents logits
# TODO: Apply 1, 2, or 3 Convolution and Max Pool layers
# Play around with different number of outputs, kernel size and stride
# Function Definition from Above:
# conv2d_maxpool(x_tensor, conv_num_outputs, conv_ksize, conv_strides, pool_ksize, pool_strides)
conv_num_outputs = 64
conv_ksize = (3, 3)
conv_strides = (1, 1)
pool_ksize = (2, 2)
pool_strides = (2, 2)
conv1 = conv2d_maxpool(x, conv_num_outputs, conv_ksize, conv_strides, pool_ksize, pool_strides)
# TODO: Apply a Flatten Layer
# Function Definition from Above:
# flatten(x_tensor)
fc1 = flatten(conv1)
# TODO: Apply 1, 2, or 3 Fully Connected Layers
# Play around with different number of outputs
# Function Definition from Above:
# fully_conn(x_tensor, num_outputs)
fc2 = fully_conn(fc1, 192)
fc3 = tf.nn.dropout(fc2, keep_prob)
# TODO: Apply an Output Layer
# Set this to the number of classes
# Function Definition from Above:
# output(x_tensor, num_outputs)
out = output(fc3, 10)
# TODO: return output
return out
## Build the Neural Network ##
# Remove previous weights, bias, inputs, etc..
# Inputs
x = neural_net_image_input((32, 32, 3))
y = neural_net_label_input(10)
keep_prob = neural_net_keep_prob_input()
# Model
logits = conv_net(x, keep_prob)
# Name logits Tensor, so that is can be loaded from disk after training
logits = tf.identity(logits, name='logits')
# Loss and Optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.AdamOptimizer().minimize(cost)
# Accuracy
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name='accuracy')
Neural Network Built!
实现函数 train_neural_network
以进行单次优化(single optimization)。该优化应该使用 optimizer
优化 session
,其中 feed_dict
每个部分都会调用该函数,所以 tf.global_variables_initializer()
def train_neural_network(session, optimizer, keep_probability, feature_batch, label_batch):
Optimize the session on a batch of images and labels
: session: Current TensorFlow session
: optimizer: TensorFlow optimizer function
: keep_probability: keep probability
: feature_batch: Batch of Numpy image data
: label_batch: Batch of Numpy label data
# TODO: Implement Function
session.run(optimizer, feed_dict={
x: feature_batch,
y: label_batch,
keep_prob: keep_probability})
Tests Passed
实现函数 print_stats
以输出损失和验证准确率。使用全局变量 valid_features
和 valid_labels
计算验证准确率。使用保留率 1.0
计算损失和验证准确率(loss and validation accuracy)。
def print_stats(session, feature_batch, label_batch, cost, accuracy):
Print information about loss and validation accuracy
: session: Current TensorFlow session
: feature_batch: Batch of Numpy image data
: label_batch: Batch of Numpy label data
: cost: TensorFlow cost function
: accuracy: TensorFlow accuracy function
# TODO: Implement Function
loss = session.run(cost, feed_dict={
x: feature_batch,
y: label_batch,
keep_prob: 1.})
valid_acc = session.run(accuracy, feed_dict={
x: valid_features,
y: valid_labels,
keep_prob: 1.})
print('loss: {:>10.4f}, Validation Accuracy: {:.6f}'.format(loss,valid_acc))
表示神经网络停止学习或开始过拟合的迭代次数 -
,表示机器内存允许的部分最大体积。大部分人设为以下常见内存大小: -
# TODO: Tune Parameters
epochs = 50
batch_size = 128
keep_probability = 0.5
先用单个部分,而不是用所有的 CIFAR-10 批次训练神经网络。这样可以节省时间,并对模型进行迭代,以提高准确率。最终验证准确率达到 50% 或以上之后,在下一部分对所有数据运行模型。
print('Checking the Training on a Single Batch...')
with tf.Session() as sess:
# Initializing the variables
# Training cycle
for epoch in range(epochs):
batch_i = 1
for batch_features, batch_labels in helper.load_preprocess_training_batch(batch_i, batch_size):
train_neural_network(sess, optimizer, keep_probability, batch_features, batch_labels)
print('Epoch {:>2}, CIFAR-10 Batch {}: '.format(epoch + 1, batch_i), end='')
print_stats(sess, batch_features, batch_labels, cost, accuracy)
Checking the Training on a Single Batch...
Epoch 1, CIFAR-10 Batch 1: loss: 1.9073, Validation Accuracy: 0.395600
save_model_path = './image_classification'
with tf.Session() as sess:
# Initializing the variables
# Training cycle
for epoch in range(epochs):
# Loop over all batches
n_batches = 5
for batch_i in range(1, n_batches + 1):
for batch_features, batch_labels in helper.load_preprocess_training_batch(batch_i, batch_size):
train_neural_network(sess, optimizer, keep_probability, batch_features, batch_labels)
print('Epoch {:>2}, CIFAR-10 Batch {}: '.format(epoch + 1, batch_i), end='')
print_stats(sess, batch_features, batch_labels, cost, accuracy)
# Save Model
saver = tf.train.Saver()
save_path = saver.save(sess, save_model_path)
Epoch 1, CIFAR-10 Batch 1: loss: 2.0538, Validation Accuracy: 0.348800
# Set batch size if not already set
if batch_size:
except NameError:
batch_size = 64
save_model_path = './image_classification'
n_samples = 4
top_n_predictions = 3
def test_model():
Test the saved model against the test dataset
test_features, test_labels = pickle.load(open('preprocess_test.p', mode='rb'))
loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
# Load model
loader = tf.train.import_meta_graph(save_model_path + '.meta')
loader.restore(sess, save_model_path)
# Get Tensors from loaded model
loaded_x = loaded_graph.get_tensor_by_name('x:0')
loaded_y = loaded_graph.get_tensor_by_name('y:0')
loaded_keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')
loaded_logits = loaded_graph.get_tensor_by_name('logits:0')
loaded_acc = loaded_graph.get_tensor_by_name('accuracy:0')
# Get accuracy in batches for memory limitations
test_batch_acc_total = 0
test_batch_count = 0
for test_feature_batch, test_label_batch in helper.batch_features_labels(test_features, test_labels, batch_size):
test_batch_acc_total += sess.run(
feed_dict={loaded_x: test_feature_batch, loaded_y: test_label_batch, loaded_keep_prob: 1.0})
test_batch_count += 1
print('Testing Accuracy: {}\n'.format(test_batch_acc_total/test_batch_count))
# Print Random Samples
random_test_features, random_test_labels = tuple(zip(*random.sample(list(zip(test_features, test_labels)), n_samples)))
random_test_predictions = sess.run(
tf.nn.top_k(tf.nn.softmax(loaded_logits), top_n_predictions),
feed_dict={loaded_x: random_test_features, loaded_y: random_test_labels, loaded_keep_prob: 1.0})
helper.display_image_predictions(random_test_features, random_test_labels, random_test_predictions)
INFO:tensorflow:Restoring parameters from ./image_classification
Testing Accuracy: 0.6437895569620253