Implementation of tree types : Week 11

Last week I have been working on fixing various tree types.

Random projection trees

I made some minor fixes of random projection trees and wrote a simple test that checks if there is a hyperplane that splits a node. I find the hyperplane as a solution of a linear system of inequalities. Curiously, but it seems the easiest iterative method solves the system fine.

Universal B-tree

I fixed a problem in the address-to-point conversion method. The method had led to wrong results if some addresses correspond to negative numbers. Moreover, since floating point data types have some excess information, some addresses correspond to infinite points. Thus, I have to add some restrictions in the conversion algorithm.

Vantage point tree

I implemented a method that tries to get a tighter bound by investigating the parent bound. The method calculates the distance to the "corner" (the intersection of two (n-1)-dimensional spheres that is an (n-2)-dimensional sphere) using properties of orthogonal transforms. The results appear to be worse since this calculation requires too many arithmetic operations and the number of base cases has decreased slightly.

Hollow hyperrectangle vantage point tree

I implemented a bound that consists of an outer rectangle and a number of rectangular hollows. I tried to test the vantage point tree with this bound. Right now, the original VP-tree bound outperforms this one, but I think the hollow-hrect bound should work faster. So, I'll continue working on the optimization of the bound.

more...

Implementation of tree types : Week 9

Last week I have been working on the vantage point tree and the universal B-tree.

Vantage point tree

I tried various fixes of the vantage point tree. Namely, I moved vantage points to separate nodes. That should considerably simplify pruning rules (we needn't implement the score algorithm for a query node and a reference point). Thus, each intermediate node of the vantage point tree contains three children. The first child contains only the vantage point. In such a way, the first point of the first sibling of a node is the centroid of the node.

This property allows to use some optimizations in dual-tree algorithms and I added this check to NeighborSearchRules and RangeSearchRules. There are still some errors, some base cases are duplicated.

Universal B-tree

The UB-tree is a variant of the B-tree. It introduces an ordering in multidimensional space i.e. the space is mapped to a linear ordered space. Notably, this mapping preserves the multidimensional clustering. Thus, all points are inserted into the UB-tree according to this ordering. I implemented the split algorithm. Right now, I am working on the implementation of the UB-tree bound. This bound is slightly more complicated than HRectBound since it consists of a number of hyperrectangles. In contrast to VantagaPointTree, this bound preserves a crutial property of BinarySpaceTree. Namely, UB-tree's children do not overlap each other.

more...

Implementation of tree types : Week 8

Last week I continued working on the vantage point tree. Since the first point of each left subtree of the vantage point tree is the centroid of its bound and the right subtree does not contain the centroid at all, I added a new method (IsFirstPointCentroid()) to the TreeType API. This check should allow dual and single tree algorithms to use some optimizations.

Moreover, I implemented two types of the random projection tree: RPTreeMax and RPTreeMean (these tree types are based on BinarySpaceTree). Right now, these tree types use HRectBound, I implemented only the split algorithm.

The RPTreeMax split divides a node by a random hyperplane. The position of the hyperplane may differ from the median by a random deviation. The RPTreeMean split divides a node by a random hyperplane if the diameter is less or equal to the average distance between points multiplied by a constant.

My implementation is sligthly different from the original algorithms. I use a fixed number of random dataset points in order to calculate the median and the average distance between points. And since the algorithm that the paper introduces in order to find the deviation from the median is weird (one of two resulting nodes may be empty), I shrank the bounds of this deviation.

more...

Implementation of tree types : Week 7

Last week, I have been working on the implementation of HollowBallBound. This class represents the area included between two concentric ball bounds. The radius of the inner ball may be equal to 0.

The HollowBallBound class is intended to replace the original piece-wise ball bound of the vantage point tree. This bound allows to calculate the minimum distance to a point much faster than the original bound. One of the disadvantages is the loss the crucial property of the binary space tree: the use of HollowBallBound may lead to overlapping children.

Since the vantage point tree is slightly different from the binary space tree, namely each intermediate node contains a point, I had to move the vantage point tree to a separate class. Unfortunately, that led to the duplication of the code.

Moreover, I added some tests for the vantage point tree.

more...

Implementation of tree types : Week 6

Last week I had been working on some fixes of the RectangleTree class, including the removal of RectangleTree::Points(), the removal of RectangleTree::Children() and the optimization of RectangleTree::NumDescendants().

The removal of the Children() method requires to make SplitType, AuxiliaryInformationType and DescentType friends of the RectangleTree class. The optimization of NumDescendants() method allows Descendant() to take $O(\log N)$ operations instead of $O(N)$.

Moreover, I finished the R+/R++ tree implementation. Namely, I replaced SortStruct by std::pair in MinimalCoverageSweep and MinimalSplitsNumberSweep, I simplified the code of MinimalCoverageSweep and I fixed the error with addition of a huge amount of equal points.

Unfortunately, I failed in the implementation of the PiecewiseBallBund class since I am not able to think out any cost-efficient method of finding the distance between a point and a piece-wise ball bound. The problem is not easy even in 2 dimensions.

more...

Implementation of tree types : Week 5

As I had planned in the previous post I did the refactoring of the R+/R++ tree. I added a template parameter SweepType to the RPlusTreeSplit class and implemented a sweep method that tries to partition an intermediate node with the minimum number of splits. All sweep techniques are designed in a tree-independent manner. In order to do that I introduced a template parameter SplitPolicy. This parameter helps to determine the subtree in which we should insert any particular child of an intermediate node that is being split. Also I fixed some errors.

Moreover, I began the vantage point tree implementation. Right now the split method (VantagePointSplit class) is implemented. I have to implement a piecewise ball boundary type since the BallBound is not suitable for the vantage point tree. Also I did not think about vantage point tree-specific tests.

more...