A first glimpse of a Neural network: Building a Digit Classifier

Keras is an easy-to-use abstraction built on top of tensorflow. Let's use Keras to build our first "hello world" of Machine Learning - The Digit Classifier

MNIST Dataset
MNIST Dataset

In this article, we will get a first look at what a Neural Network is and we will do this by building a digit classifier. A digit classifier (also called MNIST tutorial) is often called the “Hello World!” of Deep Learning.

We will use the Keras library which makes it easy to get started with Deep Learning and in a couple of lines of code we can build our digit classifier.

Digit Classifier - The Problem

The problem that we are trying to solve is for a machine to recognize and classify handwritten digits. The digits will be greyscale and will be 28 x 28 pixels and will be between 0 through 9. For this problem, we will use the MNIST Dataset which is a collection of thousands of training images (60000 images) and 10000 training images.

Loading the MNIST Dataset into Keras

Loading the MNIST Dataset into Keras is extremely easy. It comes with the library so you don’t have to import any external libraries/files.

from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Loading MNIST Dataset

The data is in the form of four Numpy arrays. (For those of you who are not familiar with Numpy, I have linked the documentation. It is highly reccommended to learn NumPy, Pandas and Matplotlib as they are the bread and butter tools to getting started with Machine Learning.)

The train_images and train_labels are called the training dataset. We will use this data to train our neural network. And concurrently, the test_labels and test_images are our testing dataset which will use to see how accurately our network performed.

To look at what the training data is, you can use numpy.shape or just print the labels out. Here is how you can do that:

A look at the training data

Here are the steps on how our digit classifier will work:

  • We will feed the neural network the training data.
  • The network will learn from the images and labels.
  • The network will predict the numbers which we will match with the corresponding test_data
  • Based on the accuracy, we will feed this into a loss function; optimize our weights and improve our network.

The Network Architecture

Let’s start by writing the code to represent the core-building blocks of a neural network — The Layers. You can imagine a layer like this: Some data goes in, and it comes out in a more representational form. Specifically, layers extract meaningful representations out of the data fed into them.

A look at the network architechture

Here, our network consists of a sequence of Dense layers (the layers are densely connected). The second layer is a 10-way softmax layer, which means that the output will be an array of 10 probability scores (summing to 1). Each score will be the probabiliity that the current digit image belongs to one of our 10 digit classes.

Part of the compilation step is picking a - loss function, an optimizer and some metrics to monitor during testing and training.

  • A loss function - This function measures the performance of the network on the training data, and how it can steer itself in the right direction.
  • An optimizer - The function through which the network will update itself.

We will talk more about the loss function and the optimizer more thoroughly over the next few topics but right now just bear with me.

Compilation step

The compilation step

Before the training starts, we will preprocess the data by reshaping it into the shape that the network expects and scaling it so all values are between 0 and 1.

Preparing the Image Data

To shape the data into the way the network expects it, we will use numpy.reshape() and numpy.astype()

Preparing the data

Preparing the labels

We use keras.utils.to_categorical to convert a class vector (integers) to binary class matrix.

Preparing the labels

We are now ready to train our network which we can do with a call to the networks fit method i.e We fit the model to it’s training data.

Training our network

As you can see, we reach an accuracy over 98%! Our machine is now more easily able to classify handwritten digits (better than humans infact.) But, the test-set accuracy turns turns out to be less than the training set accuracy. This gap between training accuracy and test accuracy is an example of what is called Overfitting(We will talk more about overfitting and underfitting in the coming chapters.). Overfitting: Machines tend to perform worsew on new data than on the training data.


We’ve reached the conclusion of our first example. Congratulations on making it this far — If you didn’t quite understand what is going on, don’t worry. We will cover all the keywords and the flow of what is happening in much more detail in the coming chapters. This example was more for getting started and getting your hands dirty with code.

As always, thanks for making it this far! In the next chapter we will talk about data representations with neural networks. See you there!

I write about ML, Web Dev, and more topics. Subscribe to get new posts by email!

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

This blog and the website is open-source on Github.

At least this isn't a full screen popup

Subscribe to my newsletter to get new posts by email! I write about DL, Climate x Tech, and more topics.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.