[mlpack] CMakeList adjustments for compiling Mlpack with Armadillo using openblas

Sat Dec 28 15:21:45 EST 2013

On Sat, Dec 28, 2013 at 08:06:32PM +0000, Steenwijk, Martijn wrote:
> Thanks again for your response :)
> 
> @benchmarks: thanks, that looks pretty impressive. Marcus did a pretty
> damn cool job on the benchmarking system. :-) Would be really helpful
> to have widely used libraries such as ANN and FLANN in there, but I'm
> not sure whether he has still time after this... The problem (or
> stated differently: "challenge") with my data is always the amount of
> points. There is no comparable standard dataset of this size... 
>
> Oh before I forget, I use ANN with "exact" precision. 

Ah, ok.  Well that's less exciting, but still good to hear mlpack is at
least keeping up.

Comparing against ANN or FLANN is somewhat difficult because what we're
trying to compare is specific implementations and not necessarily
different algorithms (Marcus, feel free to correct me if you have
different ideas).  So because we don't implement the BDD-tree,
comparison with ANN isn't just an implementation comparison.  At the
same time, no other libraries (to my knowledge) implement dual-tree
nearest-neighbor search so the comparison is already not just
implementation.  If I have some time I'll see if I can add tests for ANN
and FLANN to the existing benchmarks.

> @allkrann: that's another possibility, although my application
> normally requires exact (to very low error) accuracy. Anyway, thanks
> again, I'll try some things and let you know how they worked out. 

The idea behind rank approximation is interesting; instead of providing
a nearest neighbor with distance within 5% of the nearest neighbor, rank
approximation guarantees (probabilistically) that the returned neighbors
are in the top N% of results.  So for instance, if you set have a
dataset with 10000 points and set k = 5 (return
5 neighbors), a desired success probability of 0.95, and a rank error of
0.1%, then with probability 0.95, each of your 5 returned neighbors will
be one of the top 10 neighbors (the 0.1 percentile of 10000 points is 10
points).

Here's a link to the paper:

  http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2009_0435.pdf

It's not a particularly commonly used idea, so mlpack has the only
implementation of it.

-- 
Ryan Curtin    | "Hungry."
ryan at ratml.org |   - Sphinx