We need to go deeper, Googlenet : Project Summary

This blogpost discusses the project summary and my contributions over the summer, as GSoC 2016, approaches its conclusion. First we'll discuss how you can find most of the work that I did, this will be a list of commits from mlpack or from various branches in my fork. Then we'll discuss what were the goals of the project, and how much we accomplished them, and finally what I learnt over the summer and why working on this project with my mentors was great! :)

Project Summary

The goal of this project was to develop googlenet such that it integrates in mlpack's existing ANN API and the modules developed are reusable to other related applications.

We selected the edge_boxes algorithm for object localization. We performed the feature extraction 703 for a sample BSDS500 Dataset. Marcus helped with reusing the Tree implementation to train the structured random forest which detects edges in the image. Next, we started implementing the Neural Net part. We added functionality to the pooling layer, convolutional layer and implemented the inception layer (which is replicated throughout googlenet), concatenation layer, subnetwork layer and connect layer as additional layers and wrote the [tests][tests] for them. This will give mlpack users a significant flexibility in training more complicated and deep neural nets. We made the GoogleNet using these layers. Tests of our implementation on standard datasets still need to be finished. Here is the list of commits and pull requests (from recent to old) in different branches, with their description, you can see to track the work done over summer:

To see the week by week progress of the work done you can look at the blog.

Feature Extraction

For feature extraction the objective was that given images, segmentations and boundaries extract over 7000 features for different color spaces, gradient channels to capture the local edge structure in the images. Fast edge detection using structured forests and Sketch tokens are the papers that describe this task.

We began this process by first writing useful Image Processing algorithms to convert between color spaces (RGB to LUV), interpolating & padding images, performing convolutions, computing HOG features and calculating distance transforms. The distance transform algorithm is implemented in this paper. Then we calculated Regular and Self Similarity features on a 16x16 image patches, for 1000 edge locations and 1000 non-edge locations. For this we also shrunk channels to reduce the dimensionality of our data, and then discretized our features into classes by performing PCA, so the data could be represented by a normal decision tree or random forest. The implementations of these algorithms can be seen in 703. Then I wrote the tests for the functions implemented in the StructuredForests class and compared values against the reference implementations to verify their correctness. Marcus helped by providing implementation of the structured tree using the feature extraction code.

Inception Layer

Next, we proceeded to implement the Inception Layer. Before doing this, I needed to read some papers alexnet, visualizing CNNs to understand some CNN architectures and some ideas like Network in Network, that Googlenet paper uses by replicating the inception network inside it 9 times. It took time to understand the mlpack CNN class implementation as it uses interesting techniques of generating code using compile time recursion on templates which I was previously oblivious of. Then we made an inception layer as a collection of layers as mentioned in googlenet paper and wrote the tests for it to verify correctness. The implementation of inception layer can be seen in 757.

Adding functionality to Pooling Layer and Convolution Layer

While writing tests for inception layer, it was noticed that some functionalities of the existing classes need to be modified. For the pooling layer I added the functionality to pool with a given stride value. Then we improved the convolution layer to support Forward, Backward and Gradient updates when padding is specified. Padding is very important for deep networks, as we are able to preserve the width of our data by specifying padding, otherwise the data will become smaller as we continue to perform pooling and convolution operations on it, and we will not be able to get a neural net "deep enough". Then I wrote the tests for the pooling layer and convolution layer, and now the test for inception layer passed correctly too!

Concatenation Layer and Subnetwork Layer

On Examining the structure of the googlenet network, we felt that we need a concatenation layer. This layer will give us the functionality to concatenate the outputs of two or more layers in the forward pass, and then distribute the errors among the constituent layers for the backward pass. So I wrote a concat_layer that does exactly this and the corresponding tests.

The goal of this project was to create the components of googlenet so they are also reusable to other applications. So to make duplicating any collection of layers in a deep network easier, we decided to implement a subnet layer. The tests for the subnet layer is still under construction which will implement the inception_layer using the subnet_layer and check for correctness.

Connect Layer

With the googlenet network we faced one more interesting problem - auxillary classifiers. From one layer, there could be 2 layers diverging, and both of these layers would end up at separate output layers. Auxillary classifiers are added to googlenet to combat vanishing gradient problem while providing regularization. In mlpack implementation, the layers are stacked sequentially in the form of a tuple. To support this architectural variant, where 2 layers emerge from one layer, we added a connect layer, which contains the 2 separate nets that emerge from it, and has responsibility for passing input to and collect errors from these nets. Tests still need to be written to for the connect layer.


After all the basic components have completed, creating googlenet is as simple as stacking up all of the layers, put the desired values from the paper and calling the Train() and Predict() functions of CNN class to evaluate outputs. When we are able to complete all refinements we need to make to, all the components developed in this project, training deep neural nets with mlpack will become effortless. There is also one variant of googlenet which uses batch normalization, that I plan to contribute to mlpack with the guidance of Marcus after GSoC 2016.


The following things still need to be completed in order to achieve all the goals mentioned in our proposal: 1. Complete the edge boxes implementation. 2. Write rigorous tests for googlenet. 3. Minor improvements suggested by my mentors in the current Pull requests.


I want to thank the mlpack community for giving me this awesome opportunity to work with them on this amazing project over the summer. I was welcomed right from the first day I joined the irc channel in the beginning of the student application period, when I wasn't even sure what project I wanted to apply to for GSoC 2016. Special Thanks to my mentors Marcus Edel and Tham Ngap Wei, for clearing all my doubts (sometimes even unrelated to the project :) ) with so much patience and simple explainations, and helping me with design and debugging of the project. I feel I have learnt a lot from them, and I really enjoy being part of the mlpack community. This was a great experience, Thank you very much!


We need to go deeper, Googlenet : Week-10 & 11 Highlights

In week 10 & 11, I have, incorporated the fixes suggested in #703 and #696 for the feature extraction part. I have also applied fixes suggested in #737 for the convLayer. Completed the subnet_layer (right now we duplicate some code from cnn.hpp for the subnet layer, this maybe changed later). Completed the basic structure of the googlenet network. What still needs to be discussed is how the error from auxillary classifiers is being propagated into the main network, which I will do forthwith. Regularization in the network and writing tests for its correct working are the other tasks that still need to be done. This is what I will do in the next days. Besides I am also looking at the fixes suggested in #757 and these changes will be made as soon as some things are clear. Once these changes are done we will create the inception layer using the subnet_layer and concat_layer which will fulfill one of the objectives of the project that users can duplicate any complex collection layers in a deep network without having to explicitly write a class for that collection of layers. I will also be writing a full blog post which covers point by point everything done in the project from start to finish in the next week. Thanks for reading.


We need to go deeper, Googlenet : Week-9 Highlights

I started this week by first testing the inception layer. While writing tests I was not getting the expected outputs, so I checked the codes of ConvLayer and Pooling Layer which are called in the Inception Layer. I then corrected the code in pooling layer so that we can pool with stride correctly now. I added this feature last week only but was still not getting correct results, so I corrected the logic and tested it, and it works now. We have merged this feature.

Then I corrected small bugs in the logic of ConvLayer. The forward pass and the backward pass logic have been corrected now and give expected results, we still need to check the Gradient() function, which is my immediate task to resolve. I have written tests for the forward and backward passes of the ConvLayer and checked that they work with padding, and that they give the desired output using standard kernels.

I also wrote code for the ConcatLayer. I have completed the Forward and Backward function and checked them with tests to see that they work. This layer will give us the functionality to concatenate the outputs of two or more layers and then distribute the errors among the constituent layers for the backward pass. The Gradient() function still needs to be written, and I need to discuss what happens when we combine two or more base Layers in our ConcatLayer. Also I first need to write the test for Gradient() function of the ConvLayer then I can complete the Gradient tests for both the Inception Layer and the Concat Layer.

I think we made good progress this week, and the trivial implementation of the Inception Layer we have developed can be automated to subnet_layer. Along with this, I will discuss what other tasks need to be completed this week with my mentors and will update you about them in the next blog post. Stay tuned!


We need to go deeper, Googlenet : Week-8 Highlights

This week I finished the implementation of the Inception Layer and the test for it. The version we have finished right now is very simple and will serve as a guiding example to make subnet_layer which could take any collection of layers as input and allow the user to duplicate it over the network. I also started with the implementation of the concat_layer which will concatenate the output of one or more layers for the forward pass, and distribute error among the layers, for the backward pass. This coming week our plan is to merge the code for inception layer and complete the concat_layer and it's test.


We need to go deeper, Googlenet : Week-6 Highlights

This week, I spent time in reading C++ template constructs, and syntactic details that were necessary to understand ann code. I got stuck mostly in this part in starting days of week. I have gained some insight on how template template parameters, rvalues and some template types and other features work, which are used so much in the code. I am still in process of completely understanding the code, which I hope to do in coming days by taking help of others.

Also I have completed a first implementation of Inception Layer just about today. I still need to write a small test to check if it works which I will do next. After this completes we will discuss the plan of coming week in the next days.


We need to go deeper, Googlenet : Week-5 Highlights

I started this week by discussing "Going Deeper with Convolutions " paper with my mentor, to get an idea how to implement the inception layer. I also clarified some of the concepts regarding backprop, convolutions, and standard regularization techniques like dropout which are used in deep networks. I read the network in network paper to get an idea about 1 x 1 convolutions which was introduced here, and how smaller neural nets are used to build larger networks.

Then I fixed minor issues pointed out in the PR for feature extraction code in edge_boxes method. I tested the timings for performing convolution using armadillo submatrices, and using loops and pointers by invoking NaiveConvolution class. Armadillo submatrices gives a faster method but we have to check whether it is fast for large kernels too. If that is the case, performance may improve for the Convolution method. Then I improved the gradient function by calculating edges by applying convolution using sobel filter.

Then I looked at the ann implementation of mlpack. I looked at the convolution and pooling layers that will be used in implementing the inception layer, and had to read things for some of the functions implemented in these classes. It took me a bit of time to get accustomed to the style in which the ann method is implemented because of lot of templatization in code. I guess I still have many things to learn. I also glanced at some other implementations of googlenet in other libraries without understanding many details of course, but getting a rough idea.

I have started the implementation of the inception layer and plan to finish it in the next days. After examining the convolution_network_test, it looks very easy to stack layers for the inception layer in a similar fashion. For improving discretize function we will use the fast PCA method which uses randomized SVD as suggested and explained by my mentor. Further, interface for googlenet will be discussed once inception layer is complete.