[mlpack] GSoC MLPack - Approximate Nearest Neighbor Search

Marcos Pividori marcospividori at gmail.com
Mon Mar 21 07:21:27 EDT 2016


Dear all,

I am Marcos Pividori, a Computer Science student from the National
University of Rosario, Argentina ([1]). I am currently finishing my degree.
(5 year long degree)
I am writing because I am really interested in working this GSoC for mlpack.

After reading the different ideas, I got really interested in the project
"Approximate Nearest Neighbor Search". I have read the discussions about
this topic in the mlpack list, and I analysed the papers recommended for
this project.
I know there is not much time left to the application deadline, I apologize
for this. I thought it would be appropriate to read the main documentation,
and get a general understanding of the project, before writing.

I have participated on some GSoC projects in the past: GSoC 2013 ([2][3])
for Haskell.org and GSoC 2015 ([4][5]) for OpenCog.
During 2014, I applied and was accepted to do an internship for iRobot,
working during 4 months at the offices in Pasadena, US, on improving the
software for the Roomba product.
As final project of the AI subject, I developed a computer Go player based
on Monte Carlo Tree Search algorithm, in C++. In August 2015, I presented
it to the EST 2015, Student Contest - 44 Argentine Conference of
Informatics, where I got the first place. ([6] [7])

I would like to let you know that I am definitely really interested in this
project. I am at the final steps of my degree, and I have to start working
on my graduation thesis, but I haven't found a thesis topic yet. I see
algorithms and data structures applied to ML as a potential topic. So, this
GSoC project would give me the opportunity to go in depth in this area,
working for a popular library, and gaining valuable experience from the
community.

I have been reading the papers recommended for this topic. I think I have a
general idea of the main methods for the Approximate Nearest Neighbor
problem.
Also, I have installed the mlpack library and looked at the main code
implementation.
So, I should split the project with these main goals:
 + Implementation of approximate nearest neighbor search.
 + Implementation of a command line tool (simple task).
 + Comparisons between mlpack implementation and similar libraries, and
writing of proper documentation.

About the implemenation of ann: do you have a preference on which method to
use? Should we implement many of these different methods and compare their
results?
I have found many different options in the literature:
  + "An optimal algorithm for ANN Searching in Fixed dimensions" proposes
to use BBD-Trees for the computation, and it is implemented in the ANN
library. But I couldn't find more information on BBD trees. ANN library
also uses KD-Trees.
  + "An Investigation of Practical Approximate Nearest Neighbor Algorithms"
proposes to use Spill-Trees (maybe Hybrid) with overlapping in points
separation.
  + "Fast approximate nearest neighbors with automatic configuration"
reviews and compares the state of the art in ann. It mentioned the
randomized kd-trees and k-means clustering trees as the most promising
methods according to their results. FLANN library implements many of these
methods.
  + scikit-learn library implements Locality-Sensitive Hashing for ann.

I would be grateful if you could give me your opinion on this. Do you think
I should focus on something in special?

I continue reading about this topic and preparing a proposal.

Thanks,

Marcos

[1] http://www.unr.edu.ar
[2] http://gsoc2013cwithmobiledevices.blogspot.com.ar
[3] http://github.com/MarcosPividori/GSoC-Communicating-with-mobile-devices
[4] http://wiki.opencog.org/wikihome/index.php/Haskell_Bindings_-_GSoC_2015
[5] http://github.com/MarcosPividori/atomspace/tree/master/opencog/haskell
[6] http://44jaiio.sadio.org.ar/sites/default/files/est215-224.pdf
[7] http://github.com/MarcosPividori/Go-player
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20160321/a7fb792b/attachment-0002.html>


More information about the mlpack mailing list