Implementation of tree types : Week 11

Last week I have been working on fixing various tree types.

Random projection trees

I made some minor fixes of random projection trees and wrote a simple test that checks if there is a hyperplane that splits a node. I find the hyperplane as a solution of a linear system of inequalities. Curiously, but it seems the easiest iterative method solves the system fine.

Universal B-tree

I fixed a problem in the address-to-point conversion method. The method had led to wrong results if some addresses correspond to negative numbers. Moreover, since floating point data types have some excess information, some addresses correspond to infinite points. Thus, I have to add some restrictions in the conversion algorithm.

Vantage point tree

I implemented a method that tries to get a tighter bound by investigating the parent bound. The method calculates the distance to the "corner" (the intersection of two (n-1)-dimensional spheres that is an (n-2)-dimensional sphere) using properties of orthogonal transforms. The results appear to be worse since this calculation requires too many arithmetic operations and the number of base cases has decreased slightly.

Hollow hyperrectangle vantage point tree

I implemented a bound that consists of an outer rectangle and a number of rectangular hollows. I tried to test the vantage point tree with this bound. Right now, the original VP-tree bound outperforms this one, but I think the hollow-hrect bound should work faster. So, I'll continue working on the optimization of the bound.

more...

Implementation of tree types : Week 8

Last week I continued working on the vantage point tree. Since the first point of each left subtree of the vantage point tree is the centroid of its bound and the right subtree does not contain the centroid at all, I added a new method (IsFirstPointCentroid()) to the TreeType API. This check should allow dual and single tree algorithms to use some optimizations.

Moreover, I implemented two types of the random projection tree: RPTreeMax and RPTreeMean (these tree types are based on BinarySpaceTree). Right now, these tree types use HRectBound, I implemented only the split algorithm.

The RPTreeMax split divides a node by a random hyperplane. The position of the hyperplane may differ from the median by a random deviation. The RPTreeMean split divides a node by a random hyperplane if the diameter is less or equal to the average distance between points multiplied by a constant.

My implementation is sligthly different from the original algorithms. I use a fixed number of random dataset points in order to calculate the median and the average distance between points. And since the algorithm that the paper introduces in order to find the deviation from the median is weird (one of two resulting nodes may be empty), I shrank the bounds of this deviation.

more...