[mlpack] GSoC 2016: Fast k-centers algorithm and implementation

Abhinav Agarwalla agarwallaabhinav at gmail.com
Wed Mar 9 14:46:08 EST 2016


Hi Ryan

I wanted to clear that the problem we are solving is essentially k-center
one, right ? and not k-means. We already have an implementation of dual
tree k-means.
>From my understanding, the difference is that in k-centers, the center
points or centroids are the points that necessarily belong to the dataset,
and for k-means, it can be any point in the space.
However, in the project description, it's written that we need to find
k-points which may not be in the set. But that is essentially k-means
problem.
This got me a little confused, and I wanted to clarify what we are solving
here.

Modifying the k-means problem to perform k-centers isn't much of a problem.
What do you think ?

Regards
Abhinav

On Tue, Mar 1, 2016 at 8:11 PM, Ryan Curtin <ryan at ratml.org> wrote:

> On Tue, Mar 01, 2016 at 03:08:28PM +0530, Abhinav Agarwalla wrote:
> > Hello
> >
> > I worked really hard on understanding dual tree algorithms, and mlpack's
> > implementation of the same last year, but unfortunately we were not
> > selected. I still wanted to do the project, but got busy. I am really
> glad
> > that this time we are indeed selected.
> > I have gone through the literature, particularly N-body problems by A.
> > Gray, and cover trees one. I am also going through the list, that was
> > shared in an earlier post.
> > I was hoping if either Bill or you could share the idea, and the work
> that
> > needs to be done.
> >
> > Considering that organisation slots are fixed and numerous projects, what
> > is your priority for this particular project ?
> > I also couldn't find any issues surrounding this. Is there any
> > implementation that I can start with ?
>
> Hi Abhinav,
>
> I remember our discussions from last year; probably the best
> documentation on this project at the moment are these emails, which you
> may have already seen:
>
> https://mailman.cc.gatech.edu/pipermail/mlpack/2015-February/000610.html
> https://mailman.cc.gatech.edu/pipermail/mlpack/2014-March/000330.html
>
> A good thing to do, if you are familiar with dual-tree algorithms now,
> would be to study the k-centers problem and think about possibilities
> for an appropriate BaseCase() and Score() function.  This is a research
> project, so there is not yet a known dual-tree algorithm to solve the
> k-centers problem; there will be no reference implementation to consult.
>
> Another idea might be to go read about the Gonzalez algorithm for
> k-centers and then implement it to see how it works.
>
> About the priority of the projects: all of the projects on the ideas
> list are important---otherwise they wouldn't be there. :)  It's not
> possible to say at this point which projects will have slots allocated
> to them and which will not; that depends on the applicants, the mentors,
> and how many slots Google gives us this year.
>
> I hope this is helpful.  Please let me know if I can clarify anything.
>
> Thanks!
>
> Ryan
>
> --
> Ryan Curtin    | "Open the pig!"
> ryan at ratml.org |   - Frank Moses
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20160310/633e1ac8/attachment-0002.html>


More information about the mlpack mailing list