Implementation of tree types : Week 11

Last week I have been working on fixing various tree types.

Random projection trees

I made some minor fixes of random projection trees and wrote a simple test that checks if there is a hyperplane that splits a node. I find the hyperplane as a solution of a linear system of inequalities. Curiously, but it seems the easiest iterative method solves the system fine.

Universal B-tree

I fixed a problem in the address-to-point conversion method. The method had led to wrong results if some addresses correspond to negative numbers. Moreover, since floating point data types have some excess information, some addresses correspond to infinite points. Thus, I have to add some restrictions in the conversion algorithm.

Vantage point tree

I implemented a method that tries to get a tighter bound by investigating the parent bound. The method calculates the distance to the "corner" (the intersection of two (n-1)-dimensional spheres that is an (n-2)-dimensional sphere) using properties of orthogonal transforms. The results appear to be worse since this calculation requires too many arithmetic operations and the number of base cases has decreased slightly.

Hollow hyperrectangle vantage point tree

I implemented a bound that consists of an outer rectangle and a number of rectangular hollows. I tried to test the vantage point tree with this bound. Right now, the original VP-tree bound outperforms this one, but I think the hollow-hrect bound should work faster. So, I'll continue working on the optimization of the bound.

more...

Implementation of tree types : Week 10

Last week I have been working on the universal B tree. I finished the implementation of the bound and wrote a series of tests.

In order to explain what the bound is I have to introduce the notions $\textbf{area}$ and $\textbf{address}$.

An $\textbf{address}\ \alpha$ for an $\textbf{area}$ in an $n$-dimensional cube is a sequence $i_1.i_2.\ldots.i_l$, where $i_j\in {1,2,\ldots,2^n}$.

I'll define the notion $\textbf{area}$ recursively. Let $\alpha$ be an $\textbf{address}$ for an $\textbf{area}$ in an $n$-dimensional cube C ($\textbf{area}(C)$). Partition C into $2^n$ subcubes $sc(i),\ i=1,2,\ldots,2^n$ by dividing C in the middle of each dimension. If $\alpha=k$ then $\textbf{area}(C)[k]=\bigcup_{i=1}^{k}sc(i),\ k=0,1,\ldots,2^n-1$. If $\alpha=k.\beta$ then $\textbf{area}(C)[\alpha]=\textbf{area}(C)[k]\cup \textbf{area}(sc(k+1))[\beta]$, where $\textbf{area}(sc(k+1))[\beta]$ is an area in $sc(k+1)$.

Each child node of the universal B tree is bounded by two addresses i.e. the bound of a node is the difference of two areas. Since I deal with floating point numbers I use fixed-sized addresses in the whole space. The calculation of the distance between two nodes requires a lot of arithmetic operations. So, I decided to restrict the number of steps in the bound. Thus, children may overlap each other.

more...

Implementation of tree types : Week 9

Last week I have been working on the vantage point tree and the universal B-tree.

Vantage point tree

I tried various fixes of the vantage point tree. Namely, I moved vantage points to separate nodes. That should considerably simplify pruning rules (we needn't implement the score algorithm for a query node and a reference point). Thus, each intermediate node of the vantage point tree contains three children. The first child contains only the vantage point. In such a way, the first point of the first sibling of a node is the centroid of the node.

This property allows to use some optimizations in dual-tree algorithms and I added this check to NeighborSearchRules and RangeSearchRules. There are still some errors, some base cases are duplicated.

Universal B-tree

The UB-tree is a variant of the B-tree. It introduces an ordering in multidimensional space i.e. the space is mapped to a linear ordered space. Notably, this mapping preserves the multidimensional clustering. Thus, all points are inserted into the UB-tree according to this ordering. I implemented the split algorithm. Right now, I am working on the implementation of the UB-tree bound. This bound is slightly more complicated than HRectBound since it consists of a number of hyperrectangles. In contrast to VantagaPointTree, this bound preserves a crutial property of BinarySpaceTree. Namely, UB-tree's children do not overlap each other.

more...