We need to go deeper, Googlenet : Project Summary

This blogpost discusses the project summary and my contributions over the summer, as GSoC 2016, approaches its conclusion. First we'll discuss how you can find most of the work that I did, this will be a list of commits from mlpack or from various branches in my fork. Then we'll discuss what were the goals of the project, and how much we accomplished them, and finally what I learnt over the summer and why working on this project with my mentors was great! :)

Project Summary

The goal of this project was to develop googlenet such that it integrates in mlpack's existing ANN API and the modules developed are reusable to other related applications.

We selected the edge_boxes algorithm for object localization. We performed the feature extraction 703 for a sample BSDS500 Dataset. Marcus helped with reusing the Tree implementation to train the structured random forest which detects edges in the image. Next, we started implementing the Neural Net part. We added functionality to the pooling layer, convolutional layer and implemented the inception layer (which is replicated throughout googlenet), concatenation layer, subnetwork layer and connect layer as additional layers and wrote the [tests][tests] for them. This will give mlpack users a significant flexibility in training more complicated and deep neural nets. We made the GoogleNet using these layers. Tests of our implementation on standard datasets still need to be finished. Here is the list of commits and pull requests (from recent to old) in different branches, with their description, you can see to track the work done over summer:

To see the week by week progress of the work done you can look at the blog.

Feature Extraction

For feature extraction the objective was that given images, segmentations and boundaries extract over 7000 features for different color spaces, gradient channels to capture the local edge structure in the images. Fast edge detection using structured forests and Sketch tokens are the papers that describe this task.

We began this process by first writing useful Image Processing algorithms to convert between color spaces (RGB to LUV), interpolating & padding images, performing convolutions, computing HOG features and calculating distance transforms. The distance transform algorithm is implemented in this paper. Then we calculated Regular and Self Similarity features on a 16x16 image patches, for 1000 edge locations and 1000 non-edge locations. For this we also shrunk channels to reduce the dimensionality of our data, and then discretized our features into classes by performing PCA, so the data could be represented by a normal decision tree or random forest. The implementations of these algorithms can be seen in 703. Then I wrote the tests for the functions implemented in the StructuredForests class and compared values against the reference implementations to verify their correctness. Marcus helped by providing implementation of the structured tree using the feature extraction code.

Inception Layer

Next, we proceeded to implement the Inception Layer. Before doing this, I needed to read some papers alexnet, visualizing CNNs to understand some CNN architectures and some ideas like Network in Network, that Googlenet paper uses by replicating the inception network inside it 9 times. It took time to understand the mlpack CNN class implementation as it uses interesting techniques of generating code using compile time recursion on templates which I was previously oblivious of. Then we made an inception layer as a collection of layers as mentioned in googlenet paper and wrote the tests for it to verify correctness. The implementation of inception layer can be seen in 757.

Adding functionality to Pooling Layer and Convolution Layer

While writing tests for inception layer, it was noticed that some functionalities of the existing classes need to be modified. For the pooling layer I added the functionality to pool with a given stride value. Then we improved the convolution layer to support Forward, Backward and Gradient updates when padding is specified. Padding is very important for deep networks, as we are able to preserve the width of our data by specifying padding, otherwise the data will become smaller as we continue to perform pooling and convolution operations on it, and we will not be able to get a neural net "deep enough". Then I wrote the tests for the pooling layer and convolution layer, and now the test for inception layer passed correctly too!

Concatenation Layer and Subnetwork Layer

On Examining the structure of the googlenet network, we felt that we need a concatenation layer. This layer will give us the functionality to concatenate the outputs of two or more layers in the forward pass, and then distribute the errors among the constituent layers for the backward pass. So I wrote a concat_layer that does exactly this and the corresponding tests.

The goal of this project was to create the components of googlenet so they are also reusable to other applications. So to make duplicating any collection of layers in a deep network easier, we decided to implement a subnet layer. The tests for the subnet layer is still under construction which will implement the inception_layer using the subnet_layer and check for correctness.

Connect Layer

With the googlenet network we faced one more interesting problem - auxillary classifiers. From one layer, there could be 2 layers diverging, and both of these layers would end up at separate output layers. Auxillary classifiers are added to googlenet to combat vanishing gradient problem while providing regularization. In mlpack implementation, the layers are stacked sequentially in the form of a tuple. To support this architectural variant, where 2 layers emerge from one layer, we added a connect layer, which contains the 2 separate nets that emerge from it, and has responsibility for passing input to and collect errors from these nets. Tests still need to be written to for the connect layer.

Googlenet

After all the basic components have completed, creating googlenet is as simple as stacking up all of the layers, put the desired values from the paper and calling the Train() and Predict() functions of CNN class to evaluate outputs. When we are able to complete all refinements we need to make to, all the components developed in this project, training deep neural nets with mlpack will become effortless. There is also one variant of googlenet which uses batch normalization, that I plan to contribute to mlpack with the guidance of Marcus after GSoC 2016.

ToDo

The following things still need to be completed in order to achieve all the goals mentioned in our proposal: 1. Complete the edge boxes implementation. 2. Write rigorous tests for googlenet. 3. Minor improvements suggested by my mentors in the current Pull requests.

Acknowledgements

I want to thank the mlpack community for giving me this awesome opportunity to work with them on this amazing project over the summer. I was welcomed right from the first day I joined the irc channel in the beginning of the student application period, when I wasn't even sure what project I wanted to apply to for GSoC 2016. Special Thanks to my mentors Marcus Edel and Tham Ngap Wei, for clearing all my doubts (sometimes even unrelated to the project :) ) with so much patience and simple explainations, and helping me with design and debugging of the project. I feel I have learnt a lot from them, and I really enjoy being part of the mlpack community. This was a great experience, Thank you very much!