[mlpack] Update or insert points to an existing tree?

Mon Jul 18 10:27:51 EDT 2016

On Sat, Jul 16, 2016 at 01:50:44PM +0800, 董理 wrote:
> Dear all,
> 
> I need to maintain a tree throughout a fluid simulation to get the
> neighbors of points, and the coordinates of points will change slowly
> according to computational fluid dynamics. The tree can be
> reconstructed each time step, but would it be more efficient to update
> the existing tree? I would like to know if MLPACK provides such
> function (I didn’t find any method to do this), or should I just
> reconstruct the tree? Any suggestion is appreciated!

Hi Li,

There has been a lot of discussion on this point in academic communities
over the years and it has basically led to two schools of thought:
"create the tree and use it only once", and "update the tree so you can
use it many times".  As far as mlpack is concerned, the kd-tree and
cover tree are from the former school of thought, whereas the R tree and
its variants are from the latter.

If you are using the kd-tree, it is generally not worth
it to try and update the tree, because build time is generally quite
small.  Plus, the simplest way to update the tree would be to just
update the positions of all points in the tree, then expand the bounding
box of each node to contain all of the descendant points.  But this will
give you a tree with overlapping nodes, and depending on the level of
overlap, search performance will degrade.

The other option would be a tree created to specifically handle
insertions and deletions of points, which would be the R tree and its
variants (R* tree, R+ tree, R++ tree, X tree, Hilbert R tree... thanks
to Andrew and Mikhail's hard work over GSoC, we now have many variants
implemented!).  However, to use that in your case, as the code is now,
you would have to delete every point and then insert every modified
point, so maybe it would not be very useful to do that...

Note that you could do what was suggested in the first paragraph, which
is to modify the bounding boxes of kd-trees at each time step.  You
might be able to do that for a few steps of your simulation without
suffering from too much degradation.

So, supposing that you built a tree on a dataset, you could modify each
point in that dataset (but make sure you modify the dataset held by the
tree), then loop over each node in the tree, and call Bound() to get the
hyper-rectangle bound, and modify it such that it encloses each
descendant point of the node (which you can get with NumDescendants()
and Descendant()).

You could actually use that strategy for any type of tree, like the
cover tree and R tree too, and it should not affect the correctness of
whatever tree-based algorithm you are using.

Hopefully this is helpful... let me know if I can clarify anything. :)

Thanks,

Ryan

-- 
Ryan Curtin    | "Hi Doggy."
ryan at ratml.org |   - Johnny