[mlpack] Algorithm Optimization-GSoC'18 Idea

Sun Mar 4 03:34:00 EST 2018

Hello,

My name is Prashanth Duvvada and I'm currently a 2nd year Computer Science
student of Amrita University, India.

I'm interested in working on the GSoC 2k18 project "Algorithm Optimization"
and would like to work on improving the K Means algorithm.

I went through the code base and found out that a normal K Means method is
using the Lloyd's Algorithm. Although widely used it has two major
disadvantages

1) Worst case complexity is super polynomial of input size(mainly
exponential)
2) Approximation can be bad in some cases, specially during formation of
clustering.

To overcome the above two problems, we have K Means ++, which is a second
popular choice and implemented in several areas to improve the standard K
Means method.

1) With K-Means++, it's sure to find the solution in O(log k).
2) To address the approximations, a procedure is followed to initialize
right centers before proceeding for the standard iterations.

For reference, I have used link  [1] and [2]  for understanding this
algorithm. I would like to know what  mentors think of this methods and if
it can be used to improve the current methods.

[1]-https://en.wikipedia.org/wiki/K-means%2B%2B
[2]-
https://people.eecs.berkeley.edu/~brecht/cs294docs/week2/07.arthur.kmeans.pdf

-- 
Thanks,
Prashanth
Amrita University <http://amrita.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180304/056a5bae/attachment.html>