Friday, September 14, 2018

Image Classification with Convolutional Neural Networks, CIFAR-10 dataset

Dataset can be found here: https://www.cs.toronto.edu/~kriz/cifar.html
The dataset is broken into batches to prevent your machine from running out of memory. The CIFAR-10 dataset consists of 5 batches, named data_batch_1, data_batch_2, etc.. Each batch contains the labels and images that are one of the following:
  • 0 - airplane
  • 1 - automobile
  • 2 - bird
  • 3 - cat
  • 4 - deer
  • 5 - dog
  • 6 - frog
  • 7 - horse
  • 8 - ship
  • 9 - truck
import tarfile
with tarfile.open("D:\\NAVEED\\cifar-10\\cifar-10-python.tar.gz") as tar:
    tar.extractall()
    tar.close() 

Load and pre-process files

  • Access the image and the labels from a single batch specified by id (1-5)
  • Reshape the images, the images are fed to the convolutional layer as a 4-D tensor, notice that the reshape has the channels at axis index 1
  • Transpose the axes of the reshaped image to be in this form: [batch_size, height, width, channels]channels should be the last axis
import pickle
CIFAR10_DATASET_FOLDER = "cifar-10-batches-py"

def load_cifar10_batch(batch_id):
    #with open(CIFAR10_DATASET_FOLDER + '/data_batch_' + str(batch_id), mode='rb') as file:
    with open("D:\\NAVEED\\cifar-10\\cifar-10-python\\cifar-10-batches-py\\data_batch_1", mode='rb') as file:
        print(file)
        batch = pickle.load(file, encoding='latin1')

    features = batch['data'].reshape((len(batch['data']), 3, 32, 32)).transpose(0, 2, 3, 1)
 



 labels = batch['labels']

    return features, labels

features, labels = load_cifar10_batch(1)
features.shape
(10000, 32, 32, 3)

















Access the training& test data and the corresponding labels

Each batch in the CIFAR-10 dataset has randomly picked images, so the images come pre-shuffled
train_size = int(len(features)*0.8)
training_images = features[:train_size,:,:]
training_labels = features[:train_size]
print("Training Images:",len(training_images))
print("Training Labels:",len(training_labels))
Training Images: 8000
Training Labels: 8000
test_images = features[train_size:,:,:]
test_labels = labels[train_size:]

print("Test images: ", len(test_images))
print("Test labels: ", len(test_labels))
Test images:  2000
Test labels:  2000
height = 32
width = 32
channels = 3
n_inputs = height * width

Placeholders for training data and labels

  • The training dataset placeholder can have any number of instances and each instance is an array of 32x32 pixels (we've already reshaped the data earlier)
  • The images are fed to the convolutional layer as a 4D tensor [batch_size, height, width, channels]
X = tf.placeholder(tf.float32, shape=[None, height, width, channels], name="X")

Add a dropout layer to avoid overfitting the training data

  • The training flag is set to False during prediction and is True while training (dropout is applied only in the training phase)
  • The dropout_rate indicates the chances that a neuron is turned off during training
dropout_rate = 0.3 training = tf.placeholder_with_default(False,shape=(),name='training') X_drop = tf.layers.dropout(X,dropout_rate,training=training)
y = tf.placeholder(tf.int32,shape=[None],name="y")

Neural network design

  • 2 convolutional layers
  • 1 max pooling layer
  • 1 convolutional layer
  • 1 max pooling layer
  • 2 fully connected layers
  • Output logits layer
  • Specify the number of feature maps in each layer, a feature map highlights that area in an image which is most similar to the filter applied
  • The kernel size indicates the dimensions of the filter which is applied to the image. The filter variables are created for you and initialized randomly
  • The stride is the steps by which the filter moves over the input, the distance between two receptive fields on the input
  • "SAME" padding indicates that the convolutional layer uses zero padding on the inputs and will consider all inputs
conv1 = tf.layers.conv2d(X_drop, filters=32,
                         kernel_size=3,strides=1, padding="SAME", activation=tf.nn.relu, name="conv1")
conv2 = tf.layers.conv2d(conv1, filters=64,
                         kernel_size=3,   strides=2, padding="SAME",  activation=tf.nn.relu, name="conv2")
conv1.shape
TensorShape([Dimension(None), Dimension(32), Dimension(32), Dimension(32)])
conv2.shape
TensorShape([Dimension(None), Dimension(16), Dimension(16), Dimension(64)])

Connect a max pooling layer

  • The filter is a 2x2 filter
  • The stride is 2 both horizontally and vertically
  • This results in an image that is 1/4th the size of the original image

pool3 = tf.nn.max_pool(conv2,ksize=[1, 2, 2, 1],strides=[1, 2, 2, 1], padding="VALID") pool3.shape TensorShape([Dimension(None), Dimension(8), Dimension(8), Dimension(64)]) conv4 = tf.layers.conv2d(pool3, filters=128, kernel_size=4, strides=3, padding="SAME", activation=tf.nn.relu, name="conv4") conv4.shape TensorShape([Dimension(None), Dimension(3), Dimension(3), Dimension(128)])

Reshape the pooled layer to be a 1-D vector (flatten it)

pool5 = tf.nn.max_pool(conv4, ksize=[1, 2, 2, 1], strides=[1, 1, 1, 1],padding="VALID") pool5.shape TensorShape([Dimension(None), Dimension(2), Dimension(2), Dimension(128)]) pool5_flat = tf.reshape(pool5, shape=[-1, 128 * 2 * 2]) fullyconn1 = tf.layers.dense(pool5_flat, 128, activation=tf.nn.relu, name="fc1") fullyconn2 = tf.layers.dense(fullyconn1, 64, activation=tf.nn.relu, name="fc2")

The final output layer with softmax activation

Do not apply the softmax activation to this layer. The tf.nn.sparse_softmax_cross_entropy_with_logits will apply the softmax activation as well as calculate the cross-entropy as our cost function
logits = tf.layers.dense(fullyconn2, 10, name="output")
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y) 
loss = tf.reduce_mean(xentropy) 
optimizer = tf.train.AdamOptimizer() 
training_op = optimizer.minimize(loss)

Check correctness and accuracy of the prediction

  • Check whether the highest probability output in logits is equal to the y-label
  • Check the accuracy across all predictions (How many predictions did we get right?)
correct = tf.nn.in_top_k(logits, y, 1) accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

init = tf.global_variables_initializer()
saver = tf.train.Saver()

Set up a helper method to access training data in batches

def get_next_batch(features, labels, train_size, batch_index, batch_size): training_images = features[:train_size,:,:] training_labels = labels[:train_size] test_images = features[train_size:,:,:] test_labels = labels[train_size:] start_index = batch_index * batch_size end_index = start_index + batch_size return features[start_index:end_index,:,:], labels[start_index:end_index], test_images, test_labels

Train and evaluate the model

  • For smaller training data you'll find that the model performs poorly, it improves as you increase the size of the training data (use all batches)
  • Ensure that dropout is enabled during training to avoid overfitting
n_epochs = 10 batch_size = 128 with tf.Session() as sess: init.run() for epoch in range(n_epochs): # Add this in when we want to run the training on all batches in CIFAR-10 for batch_id in range(1, 6): batch_index = 0 features, labels = load_cifar10_batch(batch_id) train_size = int(len(features) * 0.8) for iteration in range(train_size // batch_size): X_batch, y_batch, test_images, test_labels = get_next_batch(features, labels, train_size, batch_index, batch_size) batch_index += 1 sess.run(training_op, feed_dict={X: X_batch, y: y_batch, training: True}) acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch}) acc_test = accuracy.eval(feed_dict={X: test_images, y: test_labels}) print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test) save_path = saver.save(sess, "./my_mnist_model")
9 Train accuracy: 0.73125 Test accuracy: 0.7135
Reference links: https://github.com/tflearn/tflearn/issues/57

No comments: