[mlpack] GSOC 2014: Introduction

Ryan Curtin gth671b at mail.gatech.edu
Wed Mar 12 00:08:39 EDT 2014


On Tue, Mar 11, 2014 at 05:36:09PM +0100, Marcus Edel wrote:
> The tricky part is the interface you should think about that. Ryan
> suggested some kind of an api in the following mail:
> 
> https://mailman.cc.gatech.edu/pipermail/mlpack/2014-February/000264.html

The interface is going to be tricky because I'm picky about APIs.  I
want to avoid inheritance for the reasons stated in those emails,
especially as mlpack begins to implement more and more complex template
metaprogramming functionality (see the KernelTraits, TreeTraits, and
IsVector classes).  For now, the template metaprogramming is fairly
straightforward and not yet terrifying.  I'd like to keep it that way,
but the speed/readibility tradeoff is a hard one to maintain well.

Variadic templates are a new C++11 feature which I think we could use to
allow AdaBoost to accept the types of weak classifiers it will use as
template arguments.

I would much prefer that to an inheritance-based solution; in my eyes
the use of inheritance and dynamic polymorphism (plus other related
concepts) is a slippery slope that leads to slow code.  For instance,
you really can't use inheritance for kernels or distance metrics,
because the vtable lookup time for virtual functions actually has a
non-negligible effect when the kernels or distance metrics are evaluated
many millions of times.

So, while this lookup time is not particularly relevant if the AdaBoost
class does not make many calls to the functions implemented by its weak
learners, suppose that someone else comes along later with an AdaBoost
improvement that does make very many calls to the weak learners (this is
possible -- maybe there is some modified boosting algorithm out there
that needs lots of information over and over again from the weak
learners, or, maybe something like this will be invented someday).  They
will be tempted to use inheritance there because it is the existing
convention whereas the faster solution is via templates.

Keep in mind that I'm open for discussion.  Discussion is always good
for finding the best abstraction and the best solution.

> > Your suggestions ? What should be my next step ? Start writing a
> > test interface based on a few examples ? 
> 
> Writing a test interface is probably a good Idea. Maybe you can
> include that in your potential GSoC application.

Take a look at how algorithms are tested in the src/mlpack/tests/
directory to get an idea of how to do that.  Testing algorithms is hard!
Sometimes the best you can do is compare with the performance given in
the paper, and make sure the performance of the implementation is
comparable (or better) than the paper's results.

-- 
Ryan Curtin    | "I am."
ryan at ratml.org |   - Joe



More information about the mlpack mailing list