[mlpack] #251

Sun Jan 12 00:05:01 EST 2014

Okay, I got enough info about the Pelleg-Moore k-means algorithm but since
you say its a bit complicated, i would take that up a bit later. And for
now i would start with ticket #235. :)

Can u give more info about ticket #235?
What i can make out from it is that i need to refactor the SplitNode()
function.
But give me some more info as in the BSP tree is already implemented there,
right? And is the BSP tree used by all the other functions/methods of
mlpack or just a few? And why exactly do we need to include SplitNode() in
template class?

Please provide more details if u think i would be needing them. I would
start working on this asap. :)

Regards,
Abhinav Pathak,
NITK Surathkal

On Sun, Jan 12, 2014 at 10:13 AM, Ryan Curtin <gth671b at mail.gatech.edu>wrote:

> On Sat, Jan 11, 2014 at 07:39:55PM +0530, Abhinav Pathak wrote:
> > Hey,
> >
> > I'm Abhinav from NITK Surathkal, India and i have recently started
> looking
> > into the source of mlpack and i'm greatly interested to contribute to it
> in
> > any way i can.
> >
> > Please let me know where and how should i start doing so?
> >
> >
> > And also, can i be given some more info on ticket no #251 ? As in what
> > exactly needs to be done there? I saw it mentioned in the kmeans.hpp code
> > file also. I would like to fix it, if i can be given more info.
>
> Hello Abhinav,
>
> Ticket #251 references the Pelleg-Moore k-means algorithm, detailed in
> this paper:
>
> http://ra.adm.cs.cmu.edu/anon/usr/ftp/2000/CMU-CS-00-105.pdf
>
> It's a neat algorithm that accelerates k-means by considering the points
> in batch using a tree structure.  Anyway, the ticket is a bit in-depth,
> but if you are interested, I am happy to help out...
>
> There are a lot of algorithms in mlpack that use tree-based algorithms;
> nearest neighbor, emst (fast Euclidean minimum spanning tree), FastMKS,
> range search, approximate nearest neighbor.  Basically, these algorithms
> work by building a tree on the data, and then traversing the tree (or
> multiple trees, depending on the problem) and pruning different subtrees
> to accelerate computation.  These concepts are detailed a little bit
> better in this paper:
>
> http://arxiv.org/pdf/1304.4327
>
> Anyway, mlpack implements trees and traversals in a pretty abstract
> form, so to implement a dual-tree algorithm (or a single-tree algorithm,
> which is what Pelleg-Moore k-means is), you simply need to implement the
> Score() and BaseCase() functions.  Unfortunately it's not quite that
> simple, though, because someone needs to take the Pelleg-Moore algorithm
> and figure out what the BaseCase() and Score() functions need to be.
>
> Once the Pelleg-Moore algorithm can be expressed as a BaseCase() and
> Score() function that could be paired with a pruning single-tree
> traversal, then it should be straightforward to implement both that and
> tests for it.
>
> Anyway, like I said, it's a bit complex and may be time-consuming, but
> you will learn lots of interesting things about tree-based algorithms...
> :)
>
> An easier place to start might be ticket #235:
>
> http://www.mlpack.org/trac/ticket/235
>
> Basically, the BinarySpaceTree class in
> src/mlpack/core/tree/binary_space_tree/ needs to be refactored in such a
> way that the SplitNode() function is part of a template class that is an
> argument to BinarySpaceTree.  I can clarify more on that if you like.
>
> Thanks,
>
> Ryan
>
> --
> Ryan Curtin    | "We need some time for some things to happen!"
> ryan at ratml.org |   - Bells
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140112/3c2a7d8e/attachment-0003.html>