[mlpack] [mlpack/mlpack] EMST non-optimal for d = 2 sets of forests (#823)

Ryan Curtin notifications at github.com
Tue Dec 13 16:41:42 EST 2016


Hey there,

I'm sorry for the lag in the response.  When I review what you've written and take a look, I think that the problem that you are trying to solve is actually different.  mlpack's minimum spanning tree code calculates the minimum spanning tree of the input points---it doesn't take in a dissimilarity (or distance) matrix as input.  Instead the reason that the mlpack implementation is fast is that it can avoid calculating the entire dissimilarity matrix by pruning away parts of work.

However, that strategy, which uses a dual-tree algorithm, only works if the distance used is a valid metric.  In your case, the metric appears to be

```
d(p_i, p_j) = max(d(p_i, p_j), d(p_i, N_5(p_i)), d(N_5(p_j), p_j))
```

where `N_5()` is the fifth nearest neighbor of the input point.  I am not sure if that's a valid metric.

However, if it is (i.e. if you can prove it), then what you can do is implement a new distance metric that follows the API given in this tutorial: http://www.mlpack.org/docs/mlpack-2.1.0/doxygen.php?doc=metrics.html

and then you can use a custom `TreeType` for the `DualTreeBoruvka` class (note: because this is not a Euclidean metric, you can't use typical trees like KD trees or anything.  You'll have to use cover trees or vantage point trees).  Like if your metric was called `MyMetric` you could do

```
DualTreeBoruvka<MyMetric, arma::mat, tree::StandardCoverTree>();
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/issues/823#issuecomment-266870675
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20161213/5ac8a811/attachment.html>


More information about the mlpack mailing list