[mlpack] GSoC 2014 : Introduction and Interests

Anand Soni anand.92.soni at gmail.com
Mon Mar 10 12:56:26 EDT 2014


Hi Marcus and Ryan,

I was studying on bench-marking and performance analysis of machine
learning algorithms and came across an interesting idea in a research
paper.

Suppose we need to compare 'n' algorithms for performance. (I need
more information about the algorithms that will be involved in this
project). Also, suppose I have 'k' performance metrics. Obviously we
must not infer anything by looking at an algorithm's performance based
on just one metric.

For example, in one of my projects where I did sentiment analysis
using ANNs (artificial neural networks), I got a good accuracy while
the precision/recall measures were not in good figures. This means
there is no "best algorithm". It all depends on the metrics used.

So, one of the things that I propose for this project is that we
implement, say, k metrics and perform a bootstrap analysis for the
given algorithms over these k metrics. By this, we will have a good
idea about how probable is it for an algorithm to perform "well" given
various metrics.

I have not yet decided on the metrics to use, but I am working on
that. I would like to have comments and feedback on the idea. Also, it
would be great if you can tell me the algorithms/tools that we will be
comparing for performance in the project. I can give more rigorous
details in the proposal.

Regards.

Anand Soni

On Thu, Mar 6, 2014 at 10:08 PM, Ryan Curtin <gth671b at mail.gatech.edu> wrote:
> On Wed, Mar 05, 2014 at 08:39:10PM +0530, Anand Soni wrote:
>> Thanks a lot Ryan!
>>
>> I too, would want to have a single and nice application submitted
>> rather than many. It was just out of interest that I was reading up on
>> dual trees and yes, most of the literature that I found was from
>> gatech. I also came across your paper on dual trees
>> (http://arxiv.org/pdf/1304.4327.pdf ). Can you give me some more
>> pointers where I can get a better understanding of dual trees?
>
> There are lots of papers on dual-tree algorithms but the paper you
> linked to is (to my knowledge) the only one that tries to describe
> dual-tree algorithms in an abstract manner.  Here are some links to
> other papers, but keep in mind that they focus on particular algorithms
> and often don't devote very much space to describing exactly what a
> dual-tree algorithm is:
>
> A.G. Gray and A.W. Moore. "N-body problems in statistical learning."
> Advances in Neural Information Processing Systems (2001): 521-527.
>
> A.W. Moore.  "Nonparametric density estimation: toward computational
> tractability."  Proceedings of the Third SIAM International Conference
> on Data Mining (2003).
>
> A. Beygelzimer, S. Kakade, and J.L. Langford.  "Cover trees for nearest
> neighbor."  Proceedings of the 23rd International Conference on Machine
> Learning (2006).
>
> P. Ram, D. Lee, W.B. March, A.G. Gray.  "Linear-time algorithms for
> pairwise statistical problems."  Advances in Neural Information
> Processing Systems (2009).
>
> W.B. March, P. Ram, A.G. Gray.  "Fast Euclidean minimum spanning tree:
> algorithm, analysis, and applications."  Proceedings of the 16th ACM
> SIGKDD International Conference on Knowledge Discovery and Data Mining
> (2010).
>
> R.R. Curtin, P. Ram.  "Dual-tree fast exact max-kernel search." (this
> one hasn't been published yet...
> http://www.ratml.org/pub/pdf/2013fastmks.pdf ).
>
> I know that's a lot of references and probably way more than you want to
> read, so don't feel obligated to read anything, but it will probably
> help explain exactly what a dual-tree algorithm is... I hope!  I can
> link to more papers too, if you want...
>
>> But, of course, I am more willing to work on automatic benchmarking,
>> on which I had a little talk with Marcus and I am brewing ideas.
>
> Ok, sounds good.
>
> Thanks,
>
> Ryan
>
> --
> Ryan Curtin    | "Somebody dropped a bag on the sidewalk."
> ryan at ratml.org |   - Kit



-- 
Anand Soni | Junior Undergraduate | Department of Computer Science &
Engineering | IIT Bombay | India



More information about the mlpack mailing list