[mlpack] Automatic Benchmarking

Fri Feb 28 15:55:38 EST 2014

On Fri, Feb 28, 2014 at 10:09:12AM +0005, Praveen wrote:
> Hello, 
> I am Praveen Venkateswaran, an undergraduate doing Computer Science
> and Mathematics in India. 
> I have worked with various machine learning algorithms as well as on
> information retrieval and I would love to be able to contribute to
> mlpack starting with GSoC 2014. 
> 
> I am interested in working on the improvement to the automatic
> benchmarking that was done during the last summer. I would like to
> start in terms of comparing the accuracy of implementations of
> various libraries. I've been browsing resources to try and find some
> starting point for this. [0] basically describes WiseRF's
> benchmarking of the random forest classification. 
> 
> The point that strikes me the most is that they tried whenever
> possible to see which parameters yielded better results on
> individual libraries and then compared the libraries based on those
> individualistic parameters instead of just using default parameters
> which i totally agree with as it yields more unbiased results, what
> do you think about this?
> I had already spoken to Ryan asking him to clarify details on the
> project and the crux would be the comparison of parameters. Now if
> we use the above point, then we would have to individually check out
> the best parameters to be used in particular libraries for the
> database size range and then run the individual methods on those. 
> Then we could score them on that basis (For classification
> algorithms, it’s the fraction of correctly classified samples, for
> regression algorithms it’s the mean squared error and for k-means
> it’s the inertia criterion) or something along those lines(not too
> sure about this, as I don't have experience with all the libraries
> being tested). 
> Please let me know what you think about this and any further
> suggestions would be most welcome. 
> 
> [0] http://about.wise.io/blog/2013/07/15/benchmarking-random-forest-part-1/

Hi Praveen,

Thank you for the link to the WiseRF post.  Benchmarking in and of
itself is a very difficult task, especially with respect to getting
"unbiased" results.  In reality, no result is unbiased, and it's never
possible to say "this algorithm is better!" because almost certainly
there is some dataset for which it isn't.

What the benchmarking system currently provides is a way to get a quick
idea of which implementation of an algorithm runs most quickly, but this
is aimed at answering one question:

  "How fast is an mlpack implementation of an algorithm compared to
   other implementations?"

But there are lots of possible questions we can aim to answer.  Here are
a few (there are lots of other possibilities):

  "How quickly does this approximate algorithm converge to a result
   compared to other implementations of the same approximate algorithm?"

  "How quickly does this approximate algorithm converge to a result
   compared to other implementations of different approximate
   algorithms?"

  "How accurate is this classifier compared to other classifiers,
   regardless of runtime?"

  "If I specify how much time I have, which algorithm gives the most
   accurate result in that time frame?"

None of these questions are particularly difficult to answer for a given
set of algorithms, but often the hard part is finding a good way to
visualize this data (especially that last question is very hard).

I think a good aim for this project is to expand the scope of the
benchmarking system to be able to answer one of those other questions.
Choosing parameters is another degree of freedom in the question to be
answered -- it could be "compare these algorithms with default
parameters" or "cross-validate the parameters and compare the _best_
results of the algorithms".

I think I have proposed lots of vague ideas instead of giving a definite
answer, so I am sorry about that.  :)

The project is quite open-ended and as a result it is up to the student
to come up with some interesting ideas they would like to see
implemented.  Marcus can correct me if he wanted to see the project go a
particular way, because after all the benchmarking system as a whole is
his work.  :)

I also think it's important to have an open-ended discussion about this
project in a public place, so if anyone else out there has ideas or
opinions about how this should work, please chime in!

-- 
Ryan Curtin    | "So I got that going for me, which is nice."
ryan at ratml.org |   - Carl Spackler