[mlpack] GSOC 2014: Introduction

Sun Mar 16 07:03:07 EDT 2014

Hello,

> 1. Few weak learners: 

> A question - how many weak learners would be enough ?

This depends highly on the classifier selection. A good project will select a handful (2-5) of classifiers and implement them (with tests and documentation). Remember writing a good test is often more difficult than the actual implementation. Take a look at how algorithms are tested in the 'src/mlpack/tests/'' directory to get an idea of how to do that.

> 2. For the template design, I felt what Ryan said made sense, and consequently, after looking up variadic templates, I feel they could be the right fit for this problem. 

The overall plan sounds reasonable to me. Your suggested interface doesn't contain much information, so it is hard to give you a helpful answer. But it looks like that you get the idea.

> Whether or not we want to add a default weak classifier also can to be discussed. 

Reasonable default values are 'always' a good idea. So everything works right out of the box. (Variadic template function with default values are a little bit tricky, but in some way it works.)

> I think a little more elaboration on points 1 and 2 will be enough to get me started on my proposal but I still want to think about how we can add multi-class adaboost. That, and not only implementing adaboost, would be the main goal, right ?

Correct that would be the optimal goal. You don't have to include all three multi-class ideas. At the end there are 'only' three months, to make everything as good as possible.

Feel free to ask any further questions.

Thanks,

Marcus

On 14 Mar 2014, at 21:26, Udit Saxena <saxena.udit at gmail.com> wrote:

> Hey !
> 
> Okay, so I've been going over a few points that I felt were important and needed to be discussed further before I start writing my proposal. 
> Identify a few weak learners, giving explanations as to why I chose them.
> Design a template format for multi class adaboost.
> Choose from a few multiclass adaboost algos. The above template should also provide a method to extend adaboost to these variants.
> This is what I thought about:
> 
> 1. Few weak learners: 
> OneR (rule based classifier) / decision stump.
> Simple perceptron ( one layer NN)
> Radial basis function  based ANN ( or alternatively a two layer NN ) // RBF networks is a linear comb of RBF funcs
> Artificial decision trees
> Either one of CART/ C4.5 (/C5)
> These extend the algorithms supported by mlpack and have efficient space complexity and are relatively cost effective during construction.
> 
> Command line parameters could be created using the already prevalent CLI.hpp and so on...
> 
> A question - how many weak learners would be enough ?
> 
> 2. For the template design, I felt what Ryan said made sense, and consequently, after looking up variadic templates, I feel they could be the right fit for this problem. 
> Given that each of the abve learners have been constructed, an implementation of the form-
> 
> .
> .
> template <typename... WC>
> class adaboost // say a basic adaboost template, to be extended further for multi-class
> .
> .
> 
> -would generate an adaboost class using any of those 5 classifiers, essentially giving way to 31 combinations of adaboost implementations.
> Whether or not we want to add a default weak classifier also can to be discussed. 
> 
> Next step would be to run the algorithm and generate and assign weights as and when required for each classifier, involving multiple tests and generating errors and weights on the data, eventually giving way to the final classifier.
> 
> 3. I am going through papers  -adaboost.samme and adaboost.samme.R, MP boost and adaboost-MH. So I can't really answer this now. Out of the three ( samme, mp-boost and mh ) I think only mp-boost and mh are ways of mapping multi classes to two classes, whereas the samme algos are more recent (NYU/2006 or 2009) and don't take that approach. 
> 
> I think a little more elaboration on points 1 and 2 will be enough to get me started on my proposal but I still want to think about how we can add multi-class adaboost. That, and not only implementing adaboost, would be the main goal, right ?
> 
> 
> 
> On Wed, Mar 12, 2014 at 9:38 AM, Ryan Curtin <gth671b at mail.gatech.edu> wrote:
> On Tue, Mar 11, 2014 at 05:36:09PM +0100, Marcus Edel wrote:
> > The tricky part is the interface you should think about that. Ryan
> > suggested some kind of an api in the following mail:
> >
> > https://mailman.cc.gatech.edu/pipermail/mlpack/2014-February/000264.html
> 
> The interface is going to be tricky because I'm picky about APIs.  I
> want to avoid inheritance for the reasons stated in those emails,
> especially as mlpack begins to implement more and more complex template
> metaprogramming functionality (see the KernelTraits, TreeTraits, and
> IsVector classes).  For now, the template metaprogramming is fairly
> straightforward and not yet terrifying.  I'd like to keep it that way,
> but the speed/readibility tradeoff is a hard one to maintain well.
> 
> Variadic templates are a new C++11 feature which I think we could use to
> allow AdaBoost to accept the types of weak classifiers it will use as
> template arguments.
> 
> I would much prefer that to an inheritance-based solution; in my eyes
> the use of inheritance and dynamic polymorphism (plus other related
> concepts) is a slippery slope that leads to slow code.  For instance,
> you really can't use inheritance for kernels or distance metrics,
> because the vtable lookup time for virtual functions actually has a
> non-negligible effect when the kernels or distance metrics are evaluated
> many millions of times.
> 
> So, while this lookup time is not particularly relevant if the AdaBoost
> class does not make many calls to the functions implemented by its weak
> learners, suppose that someone else comes along later with an AdaBoost
> improvement that does make very many calls to the weak learners (this is
> possible -- maybe there is some modified boosting algorithm out there
> that needs lots of information over and over again from the weak
> learners, or, maybe something like this will be invented someday).  They
> will be tempted to use inheritance there because it is the existing
> convention whereas the faster solution is via templates.
> 
> Keep in mind that I'm open for discussion.  Discussion is always good
> for finding the best abstraction and the best solution.
> 
> > > Your suggestions ? What should be my next step ? Start writing a
> > > test interface based on a few examples ?
> >
> > Writing a test interface is probably a good Idea. Maybe you can
> > include that in your potential GSoC application.
> 
> Take a look at how algorithms are tested in the src/mlpack/tests/
> directory to get an idea of how to do that.  Testing algorithms is hard!
> Sometimes the best you can do is compare with the performance given in
> the paper, and make sure the performance of the implementation is
> comparable (or better) than the paper's results.
> 
> --
> Ryan Curtin    | "I am."
> ryan at ratml.org |   - Joe
> 
> 
> 
> -- 
> ---------------------------------------
> Udit Saxena
> Student, BITS Pilani,
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140316/2ce13ad6/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4972 bytes
Desc: not available
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140316/2ce13ad6/attachment-0003.bin>