We need to go deeper, Googlenet : Project Summary

This blogpost discusses the project summary and my contributions over the summer, as GSoC 2016, approaches its conclusion. First we'll discuss how you can find most of the work that I did, this will be a list of commits from mlpack or from various branches in my fork. Then we'll discuss what were the goals of the project, and how much we accomplished them, and finally what I learnt over the summer and why working on this project with my mentors was great! :)

Project Summary

The goal of this project was to develop googlenet such that it integrates in mlpack's existing ANN API and the modules developed are reusable to other related applications.

We selected the edge_boxes algorithm for object localization. We performed the feature extraction 703 for a sample BSDS500 Dataset. Marcus helped with reusing the Tree implementation to train the structured random forest which detects edges in the image. Next, we started implementing the Neural Net part. We added functionality to the pooling layer, convolutional layer and implemented the inception layer (which is replicated throughout googlenet), concatenation layer, subnetwork layer and connect layer as additional layers and wrote the [tests][tests] for them. This will give mlpack users a significant flexibility in training more complicated and deep neural nets. We made the GoogleNet using these layers. Tests of our implementation on standard datasets still need to be finished. Here is the list of commits and pull requests (from recent to old) in different branches, with their description, you can see to track the work done over summer:

To see the week by week progress of the work done you can look at the blog.

Feature Extraction

For feature extraction the objective was that given images, segmentations and boundaries extract over 7000 features for different color spaces, gradient channels to capture the local edge structure in the images. Fast edge detection using structured forests and Sketch tokens are the papers that describe this task.

We began this process by first writing useful Image Processing algorithms to convert between color spaces (RGB to LUV), interpolating & padding images, performing convolutions, computing HOG features and calculating distance transforms. The distance transform algorithm is implemented in this paper. Then we calculated Regular and Self Similarity features on a 16x16 image patches, for 1000 edge locations and 1000 non-edge locations. For this we also shrunk channels to reduce the dimensionality of our data, and then discretized our features into classes by performing PCA, so the data could be represented by a normal decision tree or random forest. The implementations of these algorithms can be seen in 703. Then I wrote the tests for the functions implemented in the StructuredForests class and compared values against the reference implementations to verify their correctness. Marcus helped by providing implementation of the structured tree using the feature extraction code.

Inception Layer

Next, we proceeded to implement the Inception Layer. Before doing this, I needed to read some papers alexnet, visualizing CNNs to understand some CNN architectures and some ideas like Network in Network, that Googlenet paper uses by replicating the inception network inside it 9 times. It took time to understand the mlpack CNN class implementation as it uses interesting techniques of generating code using compile time recursion on templates which I was previously oblivious of. Then we made an inception layer as a collection of layers as mentioned in googlenet paper and wrote the tests for it to verify correctness. The implementation of inception layer can be seen in 757.

Adding functionality to Pooling Layer and Convolution Layer

While writing tests for inception layer, it was noticed that some functionalities of the existing classes need to be modified. For the pooling layer I added the functionality to pool with a given stride value. Then we improved the convolution layer to support Forward, Backward and Gradient updates when padding is specified. Padding is very important for deep networks, as we are able to preserve the width of our data by specifying padding, otherwise the data will become smaller as we continue to perform pooling and convolution operations on it, and we will not be able to get a neural net "deep enough". Then I wrote the tests for the pooling layer and convolution layer, and now the test for inception layer passed correctly too!

Concatenation Layer and Subnetwork Layer

On Examining the structure of the googlenet network, we felt that we need a concatenation layer. This layer will give us the functionality to concatenate the outputs of two or more layers in the forward pass, and then distribute the errors among the constituent layers for the backward pass. So I wrote a concat_layer that does exactly this and the corresponding tests.

The goal of this project was to create the components of googlenet so they are also reusable to other applications. So to make duplicating any collection of layers in a deep network easier, we decided to implement a subnet layer. The tests for the subnet layer is still under construction which will implement the inception_layer using the subnet_layer and check for correctness.

Connect Layer

With the googlenet network we faced one more interesting problem - auxillary classifiers. From one layer, there could be 2 layers diverging, and both of these layers would end up at separate output layers. Auxillary classifiers are added to googlenet to combat vanishing gradient problem while providing regularization. In mlpack implementation, the layers are stacked sequentially in the form of a tuple. To support this architectural variant, where 2 layers emerge from one layer, we added a connect layer, which contains the 2 separate nets that emerge from it, and has responsibility for passing input to and collect errors from these nets. Tests still need to be written to for the connect layer.


After all the basic components have completed, creating googlenet is as simple as stacking up all of the layers, put the desired values from the paper and calling the Train() and Predict() functions of CNN class to evaluate outputs. When we are able to complete all refinements we need to make to, all the components developed in this project, training deep neural nets with mlpack will become effortless. There is also one variant of googlenet which uses batch normalization, that I plan to contribute to mlpack with the guidance of Marcus after GSoC 2016.


The following things still need to be completed in order to achieve all the goals mentioned in our proposal: 1. Complete the edge boxes implementation. 2. Write rigorous tests for googlenet. 3. Minor improvements suggested by my mentors in the current Pull requests.


I want to thank the mlpack community for giving me this awesome opportunity to work with them on this amazing project over the summer. I was welcomed right from the first day I joined the irc channel in the beginning of the student application period, when I wasn't even sure what project I wanted to apply to for GSoC 2016. Special Thanks to my mentors Marcus Edel and Tham Ngap Wei, for clearing all my doubts (sometimes even unrelated to the project :) ) with so much patience and simple explainations, and helping me with design and debugging of the project. I feel I have learnt a lot from them, and I really enjoy being part of the mlpack community. This was a great experience, Thank you very much!


We need to go deeper, Googlenet : Week-10 & 11 Highlights

In week 10 & 11, I have, incorporated the fixes suggested in #703 and #696 for the feature extraction part. I have also applied fixes suggested in #737 for the convLayer. Completed the subnet_layer (right now we duplicate some code from cnn.hpp for the subnet layer, this maybe changed later). Completed the basic structure of the googlenet network. What still needs to be discussed is how the error from auxillary classifiers is being propagated into the main network, which I will do forthwith. Regularization in the network and writing tests for its correct working are the other tasks that still need to be done. This is what I will do in the next days. Besides I am also looking at the fixes suggested in #757 and these changes will be made as soon as some things are clear. Once these changes are done we will create the inception layer using the subnet_layer and concat_layer which will fulfill one of the objectives of the project that users can duplicate any complex collection layers in a deep network without having to explicitly write a class for that collection of layers. I will also be writing a full blog post which covers point by point everything done in the project from start to finish in the next week. Thanks for reading.


We need to go deeper, Googlenet : Week-9 Highlights

I started this week by first testing the inception layer. While writing tests I was not getting the expected outputs, so I checked the codes of ConvLayer and Pooling Layer which are called in the Inception Layer. I then corrected the code in pooling layer so that we can pool with stride correctly now. I added this feature last week only but was still not getting correct results, so I corrected the logic and tested it, and it works now. We have merged this feature.

Then I corrected small bugs in the logic of ConvLayer. The forward pass and the backward pass logic have been corrected now and give expected results, we still need to check the Gradient() function, which is my immediate task to resolve. I have written tests for the forward and backward passes of the ConvLayer and checked that they work with padding, and that they give the desired output using standard kernels.

I also wrote code for the ConcatLayer. I have completed the Forward and Backward function and checked them with tests to see that they work. This layer will give us the functionality to concatenate the outputs of two or more layers and then distribute the errors among the constituent layers for the backward pass. The Gradient() function still needs to be written, and I need to discuss what happens when we combine two or more base Layers in our ConcatLayer. Also I first need to write the test for Gradient() function of the ConvLayer then I can complete the Gradient tests for both the Inception Layer and the Concat Layer.

I think we made good progress this week, and the trivial implementation of the Inception Layer we have developed can be automated to subnet_layer. Along with this, I will discuss what other tasks need to be completed this week with my mentors and will update you about them in the next blog post. Stay tuned!


We need to go deeper, Googlenet : Week-8 Highlights

This week I finished the implementation of the Inception Layer and the test for it. The version we have finished right now is very simple and will serve as a guiding example to make subnet_layer which could take any collection of layers as input and allow the user to duplicate it over the network. I also started with the implementation of the concat_layer which will concatenate the output of one or more layers for the forward pass, and distribute error among the layers, for the backward pass. This coming week our plan is to merge the code for inception layer and complete the concat_layer and it's test.


We need to go deeper, Googlenet : Week-6 Highlights

This week, I spent time in reading C++ template constructs, and syntactic details that were necessary to understand ann code. I got stuck mostly in this part in starting days of week. I have gained some insight on how template template parameters, rvalues and some template types and other features work, which are used so much in the code. I am still in process of completely understanding the code, which I hope to do in coming days by taking help of others.

Also I have completed a first implementation of Inception Layer just about today. I still need to write a small test to check if it works which I will do next. After this completes we will discuss the plan of coming week in the next days.


We need to go deeper, Googlenet : Week-5 Highlights

I started this week by discussing "Going Deeper with Convolutions " paper with my mentor, to get an idea how to implement the inception layer. I also clarified some of the concepts regarding backprop, convolutions, and standard regularization techniques like dropout which are used in deep networks. I read the network in network paper to get an idea about 1 x 1 convolutions which was introduced here, and how smaller neural nets are used to build larger networks.

Then I fixed minor issues pointed out in the PR for feature extraction code in edge_boxes method. I tested the timings for performing convolution using armadillo submatrices, and using loops and pointers by invoking NaiveConvolution class. Armadillo submatrices gives a faster method but we have to check whether it is fast for large kernels too. If that is the case, performance may improve for the Convolution method. Then I improved the gradient function by calculating edges by applying convolution using sobel filter.

Then I looked at the ann implementation of mlpack. I looked at the convolution and pooling layers that will be used in implementing the inception layer, and had to read things for some of the functions implemented in these classes. It took me a bit of time to get accustomed to the style in which the ann method is implemented because of lot of templatization in code. I guess I still have many things to learn. I also glanced at some other implementations of googlenet in other libraries without understanding many details of course, but getting a rough idea.

I have started the implementation of the inception layer and plan to finish it in the next days. After examining the convolution_network_test, it looks very easy to stack layers for the inception layer in a similar fashion. For improving discretize function we will use the fast PCA method which uses randomized SVD as suggested and explained by my mentor. Further, interface for googlenet will be discussed once inception layer is complete.


We need to go deeper, Googlenet : Week-4 Highlights

So this week I spent time on more editing and cleaning, doing some easy optimizations and removing some of the redundant calculations that were there in the code. Also after discussion it was concluded the design of the class needs to be changed, to be similar to other parts of the library code, so we did that.

I added a discretize function to convert the structured labels of pixels, to discreet class labels which can then be an input to the decision trees, for training. Currently this function takes more time than it should so it needs to be seen how we can optimize this better.

I also changed some other parts of the code, making changes to all the functions, passing objects as reference as opposed to returning them, and refactoring according to the style guidelines. I added comments to some of the funcitons in the code, more complete discription will be prepared this week.

For reading this week, I first reviewed CNNs from the deeplearningbook, to get familiar with the terminology, and then I reviewed some of the papers relevant to the project, I read again the paper on GoogleNet, and a paper discussing the architecture of GoogleNet.

Finally for the coming week my task is to implement the inception layer using the layers that are present in ann methods of the library. I still have to look at the ann code and then I can proceed with the implementation.


We need to go deeper, Googlenet : Week-3 Highlights

This week I cleaned up the code on feature extraction, wrote tests related to it and read mlpack code that will be used for implementation of the next part of edge boxes algorithm. The details follow.

In the feature extraction code, I incorporated the changes suggested by my mentors, made some snippets code adhere to the design guidelines - used size_t instead of int to remove warnings, added const to parameters of function that need to be unchanged, and optimizing some pieces of code that seemed redundant, to name a few.

Then I proceeded to write the tests for the Image Processing functions that were implemented manually using standard libraries as reference. Specifically, the tests were written for Distance Transform, Border, RGB2LUV and Convolution Triangle functions. There were many bugs found in these functions and writing tests for these functions before evaluating the code does seem now to be a fruitful exercise.

After that I read up on PCA, Decision Stump and Hoeffedding tree implementations in mlpack. As these will be used to implement the Structured Random Forest Class.

The coming week I plan to use these classes and complete the Random Forest class.


We need to go deeper - Googlenet - Week-2 Highlights

This week was more about applying the things I learnt in the first week. It was also a week where I spent most hours debugging. I have completed the code on feature extraction and opened a pull request. Hopefully not a lot of edits will be required as I spent time on using effective ways. I also manually tested the code to some extent to try to ensure that it is correct. Because the number of features is large, I had to think a lot on how to avoid complex reshape operations on the regular and self similarity features. I think these issues can still be reviewed and will surely come up when pull request is evaluated.

After all the work done in the first 2 weeks, I can say confidently that I know now why we're doing what we're doing for the feature extraction part. This process started out as me being slightly confused as to what exactly needs to be done. It became all clear function by function. I have also started up reading on decision stumps as we will reuse these in training the random forest, which is my next assignment. The task for this week, includes reading up on decision stump and hoeffedding trees and train a structured random forest to detect edges in the image. This will be the cornerstone for the edge boxes algorithm.


We need to go deeper - Googlenet - Week-1 Highlights

The goal of my project is to develop googlenet such that it integrates in mlpack's existing ANN API and the methods developed in this project can be used in any other related applications. After a discussion with my mentors we chose edge boxes method for object localization, as it is a very fast method and gives competing performance with state of the art.

After reading through the research papers and experimenting with the mlpack code in the community bonding period, I discussed the interface with my mentors. My first task was to find to perform feature extraction given images, segmentations and boundaries for the BSDS500 Dataset. It took me some time getting acquainted with the armadillo library as the code I had to write used its functionality rigrously. So the status for my first task is as follows:

The utility functions have all been implemented. I had to look for library implementations and implement them using armadillo to calculate Convolutions, Distance Transforms, and some of the functionalities which armadillo does not provide, to name a few.

For feature extraction part, only method for writing the self similarity features remain with all of the basic underlieing methods to be used already implemented. I expected to complete feature extraction part well within this week, but I am sure all the unimplemented features will be finished by Monday. I guess it took time to read up on Vision and Image Processing concepts that were new to me, and for new functions I had to think about the apt way of implementing and looking up at the armadillo library, for each line that I write, at the same time.

Things I learnt from this week were new some Image Processing concepts, and gaining indepth knowledge of numpy and armadillo libraries. I also developed a habit of reading documentations, where previously I used to just google things and find links on stackoverflow. I also learnt how to search code in open source libraries, I had to look things up in numpy and opencv. Though I accept learning these things is a very basic skill, it took me out of my comfort zone and will definitely make me better. I also brushed up Template Programming and watched introductory slides on template metaprogramming in c++, as I come from a Java background, doing this part was easier for me.

My plans for the week ahead are as follows: To test the code and discussing with my mentors about the methods written and any improvements using a forked repo. After completing the proposed changes open a pull request. Then starting with the Structured Random Forest Implementation to complete the process of edge detection. Hopefully this will take a lesser time as most of the functionalities are implemented in mlpack and can be reused.