[mlpack] Student looking to contribute (possibly through GSoC 2016)

Shaoxiang Chen forwchen at gmail.com
Tue Mar 1 02:12:07 EST 2016


Hi everyone,

I am Shaoxiang Chen, a CS student(junior) of Fudan University, China. I'm
currently working with a professor, studying in the areas of multimedia
content analysis and machine(deep) learning.

I've read a research paper about accelerating approximate nearest neighbour
search when the distance metric of cosine similarity is used. It achieved
decent speed up by quantizing vectors into binary form, making the
computation of distance orders of magnitude faster.

I see that cosine similarity is not included in neither flann nor mlpack's
distance metrics. For one thing, it is not directly applicable to the space
partitioning tree structures(am I right?). And second, if all the vectors
are l2-normed, cosine similarity directly corresponds to l2 distance.

I've coded the proposed method myself and got decent speed up. The higher
dimension, the larger speed up. While the method itself limits its use
cases to high dimensional data(otherwise precision drops), I think it might
serve as a supplement for tree structures when dealing with high
dimensional data.

I haven't run a benchmark of this my code against other ann search
algorithms, but the paper includes some benchmarks.

If after you've read the paper you think it's ok to include this algorithm
in mlpack's neighbour-search, I'd like to contribute and discuss :)

link to the paper:
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40572.pdf

Regards
Shaoxiang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20160301/c376ea56/attachment-0002.html>


More information about the mlpack mailing list