The RASearch class: This class provides a generic manner to perform rankapproximate search via randomsampling. More...
Public Types  
typedef TreeType< MetricType, RAQueryStat< SortPolicy >, MatType >  Tree 
Convenience typedef. More...  
Public Member Functions  
RASearch (MatType referenceSet, const bool naive=false, const bool singleMode=false, const double tau=5, const double alpha=0.95, const bool sampleAtLeaves=false, const bool firstLeafExact=false, const size_t singleSampleLimit=20, const MetricType metric=MetricType())  
Initialize the RASearch object, passing both a reference dataset (this is the dataset that will be searched). More...  
RASearch (Tree *referenceTree, const bool singleMode=false, const double tau=5, const double alpha=0.95, const bool sampleAtLeaves=false, const bool firstLeafExact=false, const size_t singleSampleLimit=20, const MetricType metric=MetricType())  
Initialize the RASearch object with the given preconstructed reference tree. More...  
RASearch (const bool naive=false, const bool singleMode=false, const double tau=5, const double alpha=0.95, const bool sampleAtLeaves=false, const bool firstLeafExact=false, const size_t singleSampleLimit=20, const MetricType metric=MetricType())  
Create an RASearch object with no reference data. More...  
~RASearch ()  
Delete the RASearch object. More...  
double  Alpha () const 
Get the desired success probability. More...  
double &  Alpha () 
Modify the desired success probability. More...  
bool  FirstLeafExact () const 
Get whether or not we traverse to the first leaf without approximation. More...  
bool &  FirstLeafExact () 
Modify whether or not we traverse to the first leaf without approximation. More...  
bool  Naive () const 
Get whether or not naive (bruteforce) search is used. More...  
bool &  Naive () 
Modify whether or not naive (bruteforce) search is used. More...  
const MatType &  ReferenceSet () const 
Access the reference set. More...  
void  ResetQueryTree (Tree *queryTree) const 
This function recursively resets the RAQueryStat of the given query tree to set 'bound' to SortPolicy::WorstDistance and 'numSamplesMade' to 0. More...  
bool  SampleAtLeaves () const 
Get whether or not sampling is done at the leaves. More...  
bool &  SampleAtLeaves () 
Modify whether or not sampling is done at the leaves. More...  
void  Search (const MatType &querySet, const size_t k, arma::Mat< size_t > &neighbors, arma::mat &distances) 
Compute the rank approximate nearest neighbors of each query point in the query set and store the output in the given matrices. More...  
void  Search (Tree *queryTree, const size_t k, arma::Mat< size_t > &neighbors, arma::mat &distances) 
Compute the rank approximate nearest neighbors of each point in the prebuilt query tree and store the output in the given matrices. More...  
void  Search (const size_t k, arma::Mat< size_t > &neighbors, arma::mat &distances) 
Compute the rank approximate nearest neighbors of each point in the reference set (that is, the query set is taken to be the reference set), and store the output in the given matrices. More...  
template < typename Archive >  
void  serialize (Archive &ar, const uint32_t) 
Serialize the object. More...  
bool  SingleMode () const 
Get whether or not singletree search is used. More...  
bool &  SingleMode () 
Modify whether or not singletree search is used. More...  
size_t  SingleSampleLimit () const 
Get the limit on the size of a node that can be approximated. More...  
size_t &  SingleSampleLimit () 
Modify the limit on the size of a node that can be approximation. More...  
double  Tau () const 
Get the rankapproximation in percentile of the data. More...  
double &  Tau () 
Modify the rankapproximation in percentile of the data. More...  
void  Train (MatType referenceSet) 
"Train" the model on the given reference set. More...  
void  Train (Tree *referenceTree) 
Set the reference tree to a new reference tree. More...  
The RASearch class: This class provides a generic manner to perform rankapproximate search via randomsampling.
If the 'naive' option is chosen, this rankapproximate search will be done by randomly sampling from the whole set. If the 'naive' option is not chosen, the sampling is done in a stratified manner in the tree as mentioned in the algorithms in Figure 2 of the following paper:
RASearch is currently known to not work with ball trees (#356).
SortPolicy  The sort policy for distances; see NearestNeighborSort. 
MetricType  The metric to use for computation. 
TreeType  The tree type to use. 
Definition at line 77 of file ra_search.hpp.
typedef TreeType<MetricType, RAQueryStat<SortPolicy>, MatType> Tree 
Convenience typedef.
Definition at line 81 of file ra_search.hpp.
RASearch  (  MatType  referenceSet, 
const bool  naive = false , 

const bool  singleMode = false , 

const double  tau = 5 , 

const double  alpha = 0.95 , 

const bool  sampleAtLeaves = false , 

const bool  firstLeafExact = false , 

const size_t  singleSampleLimit = 20 , 

const MetricType  metric = MetricType() 

) 
Initialize the RASearch object, passing both a reference dataset (this is the dataset that will be searched).
Optionally, perform the computation in naive mode or singletree mode. An initialized distance metric can be given, for cases where the metric has internal data (i.e. the distance::MahalanobisDistance class).
This method will copy the matrices to internal copies, which are rearranged during treebuilding. If you don't need to keep the reference dataset, you can use std::move() to remove the overhead of making copies. Using std::move() transfers the ownership of the dataset.
tau, the rankapproximation parameter, specifies that we are looking for k neighbors with probability alpha of being in the top tau percent of nearest neighbors. So, as an example, if our dataset has 1000 points, and we want 5 nearest neighbors with 95% probability of being in the top 5% of nearest neighbors (or, the top 50 nearest neighbors), we set k = 5, tau = 5, and alpha = 0.95.
The method will fail (and throw a std::invalid_argument exception) if the value of tau is too low: tau must be set such that the number of points in the corresponding percentile of the data is greater than k. Thus, if we choose tau = 0.1 with a dataset of 1000 points and k = 5, then we are attempting to choose 5 nearest neighbors out of the closest 1 point – this is invalid.
referenceSet  Set of reference points. 
naive  If true, the rankapproximate search will be performed by directly sampling the whole set instead of using the stratified sampling on the tree. 
singleMode  If true, singletree search will be used (as opposed to dualtree search). This is useful when Search() will be called with few query points. 
metric  An optional instance of the MetricType class. 
tau  The rankapproximation in percentile of the data. The default value is 5%. 
alpha  The desired success probability. The default value is 0.95. 
sampleAtLeaves  Sample at leaves for faster but less accurate computation. This defaults to 'false'. 
firstLeafExact  Traverse to the first leaf without approximation. This can ensure that the query definitely finds its (near) duplicate if there exists one. This defaults to 'false' for now. 
singleSampleLimit  The limit on the largest node that can be approximated by sampling. This defaults to 20. 
RASearch  (  Tree *  referenceTree, 
const bool  singleMode = false , 

const double  tau = 5 , 

const double  alpha = 0.95 , 

const bool  sampleAtLeaves = false , 

const bool  firstLeafExact = false , 

const size_t  singleSampleLimit = 20 , 

const MetricType  metric = MetricType() 

) 
Initialize the RASearch object with the given preconstructed reference tree.
It is assumed that the points in the tree's dataset correspond to the reference set. Optionally, choose to use singletree mode. Naive mode is not available as an option for this constructor; instead, to run naive computation, use a different constructor. Additionally, an instantiated distance metric can be given, for cases where the distance metric holds data.
There is no copying of the data matrices in this constructor (because treebuilding is not necessary), so this is the constructor to use when copies absolutely must be avoided.
tau, the rankapproximation parameter, specifies that we are looking for k neighbors with probability alpha of being in the top tau percent of nearest neighbors. So, as an example, if our dataset has 1000 points, and we want 5 nearest neighbors with 95% probability of being in the top 5% of nearest neighbors (or, the top 50 nearest neighbors), we set k = 5, tau = 5, and alpha = 0.95.
The method will fail (and throw a std::invalid_argument exception) if the value of tau is too low: tau must be set such that the number of points in the corresponding percentile of the data is greater than k. Thus, if we choose tau = 0.1 with a dataset of 1000 points and k = 5, then we are attempting to choose 5 nearest neighbors out of the closest 1 point – this is invalid.
referenceTree  Prebuilt tree for reference points. 
singleMode  Whether singletree computation should be used (as opposed to dualtree computation). 
tau  The rankapproximation in percentile of the data. The default value is 5%. 
alpha  The desired success probability. The default value is 0.95. 
sampleAtLeaves  Sample at leaves for faster but less accurate computation. This defaults to 'false'. 
firstLeafExact  Traverse to the first leaf without approximation. This can ensure that the query definitely finds its (near) duplicate if there exists one. This defaults to 'false' for now. 
singleSampleLimit  The limit on the largest node that can be approximated by sampling. This defaults to 20. 
metric  Instantiated distance metric. 
RASearch  (  const bool  naive = false , 
const bool  singleMode = false , 

const double  tau = 5 , 

const double  alpha = 0.95 , 

const bool  sampleAtLeaves = false , 

const bool  firstLeafExact = false , 

const size_t  singleSampleLimit = 20 , 

const MetricType  metric = MetricType() 

) 
Create an RASearch object with no reference data.
If Search() is called before a reference set is set with Train(), an exception will be thrown.
naive  Whether naive (bruteforce) search should be used. 
singleMode  Whether singletree computation should be used (as opposed to dualtree computation). 
tau  The rankapproximation in percentile of the data. The default value is 5%. 
alpha  The desired success probability. The default value is 0.95. 
sampleAtLeaves  Sample at leaves for faster but less accurate computation. This defaults to 'false'. 
firstLeafExact  Traverse to the first leaf without approximation. This can ensure that the query definitely finds its (near) duplicate if there exists one. This defaults to 'false' for now. 
singleSampleLimit  The limit on the largest node that can be approximated by sampling. This defaults to 20. 
metric  Instantiated distance metric. 
~RASearch  (  ) 
Delete the RASearch object.
The tree is the only member we are responsible for deleting. The others will take care of themselves.

inline 
Get the desired success probability.
Definition at line 342 of file ra_search.hpp.

inline 
Modify the desired success probability.
Definition at line 344 of file ra_search.hpp.

inline 
Get whether or not we traverse to the first leaf without approximation.
Definition at line 352 of file ra_search.hpp.

inline 
Modify whether or not we traverse to the first leaf without approximation.
Definition at line 354 of file ra_search.hpp.

inline 
Get whether or not naive (bruteforce) search is used.
Definition at line 327 of file ra_search.hpp.

inline 
Modify whether or not naive (bruteforce) search is used.
Definition at line 329 of file ra_search.hpp.

inline 
Access the reference set.
Definition at line 324 of file ra_search.hpp.
void ResetQueryTree  (  Tree *  queryTree  )  const 
This function recursively resets the RAQueryStat of the given query tree to set 'bound' to SortPolicy::WorstDistance and 'numSamplesMade' to 0.
This allows a user to perform multiple searches with the same query tree, possibly with different levels of approximation without requiring to build a new pair of trees for every new (approximate) search.
If Search() is called multiple times with the same query tree without calling ResetQueryTree(), the results may not satisfy the theoretical guarantees provided by the rankapproximate neighbor search algorithm.
queryTree  Tree whose statistics should be reset. 

inline 
Get whether or not sampling is done at the leaves.
Definition at line 347 of file ra_search.hpp.

inline 
Modify whether or not sampling is done at the leaves.
Definition at line 349 of file ra_search.hpp.
void Search  (  const MatType &  querySet, 
const size_t  k,  
arma::Mat< size_t > &  neighbors,  
arma::mat &  distances  
) 
Compute the rank approximate nearest neighbors of each query point in the query set and store the output in the given matrices.
The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.
If querySet is small or only contains one point, it can be faster to do singletree search; singletree search can be set with the SingleMode() function or in the constructor.
querySet  Set of query points (can be a single point). 
k  Number of neighbors to search for. 
neighbors  Matrix storing lists of neighbors for each query point. 
distances  Matrix storing distances of neighbors for each query point. 
void Search  (  Tree *  queryTree, 
const size_t  k,  
arma::Mat< size_t > &  neighbors,  
arma::mat &  distances  
) 
Compute the rank approximate nearest neighbors of each point in the prebuilt query tree and store the output in the given matrices.
The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.
If singleMode or naive is enabled, then this method will throw a std::invalid_argument exception; calling this function implies a dualtree algorithm.
queryTree  Tree built on query points. 
k  Number of neighbors to search for. 
neighbors  Matrix storing lists of neighbors for each query point. 
distances  Matrix storing distances of neighbors for each query point. 
void Search  (  const size_t  k, 
arma::Mat< size_t > &  neighbors,  
arma::mat &  distances  
) 
Compute the rank approximate nearest neighbors of each point in the reference set (that is, the query set is taken to be the reference set), and store the output in the given matrices.
The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.
k  Number of neighbors to search for. 
neighbors  Matrix storing lists of neighbors for each point. 
distances  Matrix storing distances of neighbors for each query point. 
void serialize  (  Archive &  ar, 
const uint32_t  
) 
Serialize the object.
Referenced by RASearch< NearestNeighborSort, metric::EuclideanDistance, arma::mat, TreeType >::SingleSampleLimit().

inline 
Get whether or not singletree search is used.
Definition at line 332 of file ra_search.hpp.

inline 
Modify whether or not singletree search is used.
Definition at line 334 of file ra_search.hpp.

inline 
Get the limit on the size of a node that can be approximated.
Definition at line 357 of file ra_search.hpp.

inline 
Modify the limit on the size of a node that can be approximation.
Definition at line 359 of file ra_search.hpp.

inline 
Get the rankapproximation in percentile of the data.
Definition at line 337 of file ra_search.hpp.

inline 
Modify the rankapproximation in percentile of the data.
Definition at line 339 of file ra_search.hpp.
void Train  (  MatType  referenceSet  ) 
"Train" the model on the given reference set.
If treebased search is being used (if Naive() is false), the reference tree is rebuilt. Thus, a copy of the reference dataset is made. If you don't need to keep the dataset, you can avoid copying by using std::move(). This transfers the ownership of the dataset.
referenceSet  New reference set to use. 
void Train  (  Tree *  referenceTree  ) 
Set the reference tree to a new reference tree.