[mlpack] GSoC 2013 Report - June 17, 2013

Mon Jun 24 04:46:30 EDT 2013

Hello to everyone.

As a part of Google Summer of Code 2013, I have been working on automatic benchmarking of mlpack methods. It has been one week since I started working, and I would like to share with the community of the work I've been doing.

Since I'm still learning and getting used to the mlpack methods, I have started out with the simpler methods. I hope to speed thinks up. I would appreciate feedback and comments on how things can be done better.

Most of what I've done so far has to do with the basis of the benchmark script. I've created an initial configuration file, used by the benchmark script to identify the available methods to be run and the associated parser. I've picked YAML as the configuration file format for specifying the structure because YAML is a data serialization language that's both powerful and human readable, afterwards I've added some initial benchmark methods with some interesting small and large datasets from the UCI Machine Learning Repository to test the methods. To test the benchmark system, you have to run the benchmark.py file.

A short description of my commits can be seen here [1]. If anyone is interested in detail what I'm doing, do take a look at the notes available here[2].

Please suggest any improvements I can do. I will keep informing you all of my progress approximately every week.

Cheers,

Marcus

[1] http://trac.research.cc.gatech.edu/fastlab/changeset/15254
[2] http://trac.research.cc.gatech.edu/fastlab/wiki/AutomaticBenchmark