CNNs 101: How do they work? (Part 2 of 3)

First off, it is about damn time I get this post out. This was supposed to be out a long damn time ago, but c'est la vie.

This post will be a little bit different than I had originally planned. It was going to be full of equations and mathemagics, but then when I was doing the research, I found that other people had already put in the effort. So why reinvent the wheel. This post will contain a link to a sweet paper on convolutional mathemagics and some YouTube links for those of you who don't feel like reading.

Without further ado, onto the actual post...


Okay, not quite yet. Gotta bring up the road map so you can see where you're at (*you are here): 

  1. Overview of CNNs

    • What they are good at

    • What a basic CNN looks like

    • The basic mechanisms involved

  2. Opening the black box*

    • What exactly is happening inside CNNs

    • What do we have control over?

      • Introduction to parameters that we can directly change (Hyperparameters)

  3. Practical applications

I want to note that I will be splitting this second section into two parts. This first part will provide some resources for learning what is happening inside of a CNN. The second part will provide some knowledge about the parameters of CNNs that we can select, included will be some tools that can be used to help optimize the hyperparameters.

Now we can begin.


Inside a CNN

If you recall from the first post, the meat of any convolutional neural network (CNN) is the convolutional stack. This stack is comprised of a convolutional layer followed by a pooling layer. While I had gone through the overall function of these layers, I did not go through much of the specifics. This is where I go through some of the specifics. And by me, I mean Vincent Dumoulin and Francesco Visin in their paper A guide to convolution arithmetic for deep learning.

As the title might suggest, their paper provides a very clear base for understanding the computations done in a basic convolutional layer. They cover the mathemagic used in understanding how the output matrices are calculated in convolutional layers, pooling layers, and even transposed convolutional layers (which I won't be focusing on).

However, the paper does not cover the specific mathemagics that occurs between layers (and within layers in the case of fully connected layers) during the updating of the features when back propagation occurs.

Thats right, the feature maps (or activation maps) that are the output of each convolutional layer are not static. These maps are dynamic structures that get adjusted during a process called backpropagation, which occurs when the network makes a false classification (think of it as a learning function). Rather than reading how all this happens. The best way to understand these networks is to build one, from scratch.

And by scratch, I mean not using any packages (other than numpy). That means no TensorFlow, no PyTorch, no Theano, no Caffe, only numpy. Thankfully, Siraj has an excellent YouTube video that walks you step-by-step through the process. He even throws some swag on it by wrapping the code in a webapp which allows for some neat interactivity.

The link for the git used in the video can be found here. The jupyter notebook covers everything you could possibly want to know about basic CNNs. At the bottom of the notebook is where the numpy_CNN implementation is found.

I will leave you all for now with a link to the web page of the Stanford University class for using CNNs for visual recognition. Under the class notes section you will find a bunch of useful materials including tutorials for basic python and Google cloud computing, in addition to neural nets and convolutional neural nets. 

See you all soon for a discussion about hyperparameters, their selection, and optimization.

I'm out chyea.