[mlpack] Design Considerations for Unit Tests for Stochastic Optimization

Mon Apr 24 13:42:26 EDT 2017

Hi All,
I was looking onto one of the open issues in mlpack "Tests for Stochastic
Optimization" <https://github.com/mlpack/mlpack/issues/894>. The idea here
is to implement a set of unit test for evaluating variants of Stochastic
Gradient Descent algorithm on a diverse set of loss functions, passing
which is necessary for a SGD algorithm to prove its generality. The idea
for this work is inspired from Schaul et al.'s paper "Unit Tests for
Stochastic Optimization" <https://arxiv.org/abs/1312.6055>. For those, who
want to skip the tedious task of going through the paper, here are some
bullet points that briefly cover most of the important aspects mentioned in
the paper.

* Each prototype function to be evaluated as a unit test is composed of
some very simple one-dimensional mathematical functions such as Linear,
Quadratic, Gaussian, LaPlacian, Absolute Value Function, ReLu, Sigmoid etc.
Each function is defined on a particular interval.
* These Simple 1D prototype functions can be concatenated to form more
complex 1D prototypes like concatenating Line function followed by
Quadratic bowl followed by a cliff.
* 1D function prototypes can further be expanded onto multi-dimensional
functions by using suitable norms.
* There's also the feature to add different noise prototypes like additive
gaussian noise to better mimic the behavior of real world loss functions.
* There's also mechanisms to introduce different amounts of curl to
multi-dimensional vector field to create loss functions similar to that
produced by temporal difference learning in reinforcement learning.
* There's also functionalities to create non-stationary objective functions
which are typically observed in real world scenarios.

Obviously, the paper itself has a much richer set of information than that
could be covered in some bullet points. So, please consider giving it a
look if you want to get a depth understanding of the problem at hand.

One of the most important points I want to mention is that the authors of
the paper have already open sourced a reference implementation of their
work on github <https://github.com/IoannisAntonoglou/optimBench>, which is
written lua. Already having a reference implementation in hand makes our
life a lot easier because all of the function logics including all the
mathematical intricacies can simply be ported to C++. This gives us a lot
of time to put more effort on designing the framework to better suit the
styles of C++ and mlpack.

Some of the design decisions taken by authors although suits that of a
scripting language like Lua, simply doesn't match the standards of general
purpose programming language like C++. Especially, with a framework like
mlpack, which touts template metaprogramming as its most important feature.

Take for an example, the way 1D concatenation and multi-dimensional scaling
is being handled by the reference framework. The way it has been
implemented is by passing the function_prototype, noise_prototype and
corresponding dimensions in a nicely formatted string, which will then be
parsed by a parser to generate the corresponding function. In a language
like C++, this will be better achieved either by using method chaining or
by overloading the '+' operator.

Ex:-

FunctionProtype f = LinearUnit(starting_point, ending_point, dimension#1)
                                .add(ReLuUnit(starting_point, ending_point,
dimension#1)
                                .add(QuadraticUnit(starting_point,
ending_point, dimension#2)
                                .add(NoisePrototype(starting_point,
ending_point, dimension#1)
                                .curl(rotation_matrix)

or

FunctionPrototype f = (LinearUnit(starting_point, ending_point, dimension#1)
                                + (ReLuUnit(starting_point, ending_point,
dimension#1)
                                + (QuadraticUnit(starting_point,
ending_point, dimension#2)
                                + (NoisePrototype(starting_point,
ending_point, dimension#1))
                                .curl(rotation_matrix)

There's also the consideration of designing the FunctionProtoypes in such a
way that it matches the FunctionType parameter taken by existing
implementation of SGD variants in mlpack. (Although, this wouldn't be much
of a problem.)

My point here being, as we already have a reference implementation and
hence all the function logics, it would be wiser to spend some extra time
designing the problem carefully, rather than jumping onto the
implementation directly. I'll come up with another mail to demonstrate a
basic Class Hierarchy, which I think would better suit this situation.
Please, feel free to add any comments or suggestions meanwhile.

Regards
Saswat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20170424/4852e1d9/attachment.html>