[mlpack] GSoC 2016: Fast k-centers algorithm and implementation

Ryan Curtin ryan at ratml.org
Thu Mar 10 09:35:12 EST 2016


On Thu, Mar 10, 2016 at 01:16:08AM +0530, Abhinav Agarwalla wrote:
> Hi Ryan
> 
> I wanted to clear that the problem we are solving is essentially k-center
> one, right ? and not k-means. We already have an implementation of dual
> tree k-means.
> From my understanding, the difference is that in k-centers, the center
> points or centroids are the points that necessarily belong to the dataset,
> and for k-means, it can be any point in the space.
> However, in the project description, it's written that we need to find
> k-points which may not be in the set. But that is essentially k-means
> problem.
> This got me a little confused, and I wanted to clarify what we are solving
> here.
> 
> Modifying the k-means problem to perform k-centers isn't much of a problem.
> What do you think ?

Hi Abhinav,

You're right, that is a typo in the algorithm description, and I've
fixed the wiki page.

But modifying the k-means problem to perform k-centers is not going to
work that easily... note that it is possible to make a dual-tree
algorithm to solve k-means clustering exactly in the same way as the
original Lloyd algorithm, but k-centers is NP-hard and therefore the
best we can do is an approximation.  So the problem will come down to
using distance-based reasoning to develop a BaseCase() and Score()
function that minimize (at least approximately) the objective function
of k-centers (see section 1.3 of
http://cseweb.ucsd.edu/~dasgupta/291-geom/kcenter.pdf ).

You can take a look at an (extremely complex) algorithm to solve k-means
clustering here:

http://ratml.org/pub/pdf/2016dual.pdf

That might be helpful as a sort of guide to the type of geometric
thinking that's necessary to develop a dual-tree algorithm.  It might
also be good to look at simpler ones, like nearest neighbor search,
kernel density estimation, or others (like in the paper
"Tree-Independent Dual-Tree Algorithms").

I hope this is helpful; please let me know if I can clarify anything.

Thanks,

Ryan

-- 
Ryan Curtin    | "You can think about it... but don't do it."
ryan at ratml.org |   - Sheriff Justice



More information about the mlpack mailing list