The LSHSearch class; this class builds a hash on the reference set and uses this hash to compute the distanceapproximate nearestneighbors of the given queries. More...
Public Member Functions  
LSHSearch (MatType referenceSet, const arma::cube &projections, const double hashWidth=0.0, const size_t secondHashSize=99901, const size_t bucketSize=500)  
This function initializes the LSH class. More...  
LSHSearch (MatType referenceSet, const size_t numProj, const size_t numTables, const double hashWidth=0.0, const size_t secondHashSize=99901, const size_t bucketSize=500)  
This function initializes the LSH class. More...  
LSHSearch ()  
Create an untrained LSH model. More...  
LSHSearch (const LSHSearch &other)  
Copy the given LSH model. More...  
LSHSearch (LSHSearch &&other)  
Take ownership of the given LSH model. More...  
size_t  BucketSize () const 
Get the bucket size of the second hash. More...  
size_t  DistanceEvaluations () const 
Return the number of distance evaluations performed. More...  
size_t &  DistanceEvaluations () 
Modify the number of distance evaluations performed. More...  
size_t  NumProjections () const 
Get the number of projections. More...  
const arma::mat &  Offsets () const 
Get the offsets 'b' for each of the projections. (One 'b' per column.) More...  
LSHSearch &  operator= (const LSHSearch &other) 
Copy the given LSH model. More...  
LSHSearch &  operator= (LSHSearch &&other) 
Take ownership of the given LSH model. More...  
const arma::cube &  Projections () 
Get the projection tables. More...  
void  Projections (const arma::cube &projTables) 
Change the projection tables (this retrains the LSH model). More...  
const MatType &  ReferenceSet () const 
Return the reference dataset. More...  
void  Search (const MatType &querySet, const size_t k, arma::Mat< size_t > &resultingNeighbors, arma::mat &distances, const size_t numTablesToSearch=0, const size_t T=0) 
Compute the nearest neighbors of the points in the given query set and store the output in the given matrices. More...  
void  Search (const size_t k, arma::Mat< size_t > &resultingNeighbors, arma::mat &distances, const size_t numTablesToSearch=0, size_t T=0) 
Compute the nearest neighbors and store the output in the given matrices. More...  
const std::vector< arma::Col< size_t > > &  SecondHashTable () const 
Get the second hash table. More...  
const arma::vec &  SecondHashWeights () const 
Get the weights of the second hash. More...  
template < typename Archive >  
void  serialize (Archive &ar, const uint32_t version) 
Serialize the LSH model. More...  
void  Train (MatType referenceSet, const size_t numProj, const size_t numTables, const double hashWidth=0.0, const size_t secondHashSize=99901, const size_t bucketSize=500, const arma::cube &projection=arma::cube()) 
Train the LSH model on the given dataset. More...  
Static Public Member Functions  
static double  ComputeRecall (const arma::Mat< size_t > &foundNeighbors, const arma::Mat< size_t > &realNeighbors) 
Compute the recall (% of neighbors found) given the neighbors returned by LSHSearch::Search and a "ground truth" set of neighbors. More...  
The LSHSearch class; this class builds a hash on the reference set and uses this hash to compute the distanceapproximate nearestneighbors of the given queries.
SortPolicy  The sort policy for distances; see NearestNeighborSort. 
MatType  Type of matrix to use to store the data. 
Definition at line 72 of file lsh_search.hpp.
LSHSearch  (  MatType  referenceSet, 
const arma::cube &  projections,  
const double  hashWidth = 0.0 , 

const size_t  secondHashSize = 99901 , 

const size_t  bucketSize = 500 

) 
This function initializes the LSH class.
It builds the hash on the reference set with 2stable distributions. See the individual functions performing the hashing for details on how the hashing is done. In order to avoid copying the reference set, it is suggested to pass that parameter with std::move().
referenceSet  Set of reference points and the set of queries. 
projections  Cube of projection tables. For a cube of size (a, b, c) we set numProj = a, numTables = c. b is the reference set dimensionality. 
hashWidth  The width of hash for every table. If 0 (the default) is provided, then the hash width is automatically obtained by computing the average pairwise distance of 25 pairs. This should be a reasonable upper bound on the nearestneighbor distance in general. 
secondHashSize  The size of the second hash table. This should be a large prime number. 
bucketSize  The size of the bucket in the second hash table. This is the maximum number of points that can be hashed into single bucket. A value of 0 indicates that there is no limit (so the second hash table can be arbitrarily large—be careful!). 
LSHSearch  (  MatType  referenceSet, 
const size_t  numProj,  
const size_t  numTables,  
const double  hashWidth = 0.0 , 

const size_t  secondHashSize = 99901 , 

const size_t  bucketSize = 500 

) 
This function initializes the LSH class.
It builds the hash one the reference set using the provided projections. See the individual functions performing the hashing for details on how the hashing is done. In order to avoid copying the reference set, consider passing the set with std::move().
referenceSet  Set of reference points and the set of queries. 
numProj  Number of projections in each hash table (anything between 1050 might be a decent choice). 
numTables  Total number of hash tables (anything between 1020 should suffice). 
hashWidth  The width of hash for every table. If 0 (the default) is provided, then the hash width is automatically obtained by computing the average pairwise distance of 25 pairs. This should be a reasonable upper bound on the nearestneighbor distance in general. 
secondHashSize  The size of the second hash table. This should be a large prime number. 
bucketSize  The size of the bucket in the second hash table. This is the maximum number of points that can be hashed into single bucket. A value of 0 indicates that there is no limit (so the second hash table can be arbitrarily large—be careful!). 
LSHSearch  (  ) 
Copy the given LSH model.
other  Other LSH model to copy. 
Take ownership of the given LSH model.
other  Other LSH model to take ownership of. 

inline 
Get the bucket size of the second hash.
Definition at line 291 of file lsh_search.hpp.

static 
Compute the recall (% of neighbors found) given the neighbors returned by LSHSearch::Search and a "ground truth" set of neighbors.
The recall returned will be in the range [0, 1].
foundNeighbors  Set of neighbors to compute recall of. 
realNeighbors  Set of "ground truth" neighbors to compute recall against. 

inline 
Return the number of distance evaluations performed.
Definition at line 274 of file lsh_search.hpp.

inline 
Modify the number of distance evaluations performed.
Definition at line 276 of file lsh_search.hpp.

inline 
Get the number of projections.
Definition at line 282 of file lsh_search.hpp.

inline 
Get the offsets 'b' for each of the projections. (One 'b' per column.)
Definition at line 285 of file lsh_search.hpp.
Copy the given LSH model.
other  Other LSH model to copy. 
Take ownership of the given LSH model.
other  Other LSH model to take ownership of. 

inline 
Get the projection tables.
Definition at line 298 of file lsh_search.hpp.

inline 
Change the projection tables (this retrains the LSH model).
Definition at line 301 of file lsh_search.hpp.
References LSHSearch< SortPolicy, MatType >::Train().

inline 
Return the reference dataset.
Definition at line 279 of file lsh_search.hpp.
void Search  (  const MatType &  querySet, 
const size_t  k,  
arma::Mat< size_t > &  resultingNeighbors,  
arma::mat &  distances,  
const size_t  numTablesToSearch = 0 , 

const size_t  T = 0 

) 
Compute the nearest neighbors of the points in the given query set and store the output in the given matrices.
The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.
querySet  Set of query points. 
k  Number of neighbors to search for. 
resultingNeighbors  Matrix storing lists of neighbors for each query point. 
distances  Matrix storing distances of neighbors for each query point. 
numTablesToSearch  This parameter allows the user to have control over the number of hash tables to be searched. This allows the user to pick the number of tables it can afford for the time available without having to build hashing for every table size. By default, this is set to zero in which case all tables are considered. 
T  The number of additional probing bins to examine with multiprobe LSH. If T = 0, classic singleprobe LSH is run (default). 
void Search  (  const size_t  k, 
arma::Mat< size_t > &  resultingNeighbors,  
arma::mat &  distances,  
const size_t  numTablesToSearch = 0 , 

size_t  T = 0 

) 
Compute the nearest neighbors and store the output in the given matrices.
The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.
k  Number of neighbors to search for. 
resultingNeighbors  Matrix storing lists of neighbors for each query point. 
distances  Matrix storing distances of neighbors for each query point. 
numTablesToSearch  This parameter allows the user to have control over the number of hash tables to be searched. This allows the user to pick the number of tables it can afford for the time available without having to build hashing for every table size. By default, this is set to zero in which case all tables are considered. 
T  Number of probing bins. 

inline 
Get the second hash table.
Definition at line 294 of file lsh_search.hpp.

inline 
Get the weights of the second hash.
Definition at line 288 of file lsh_search.hpp.
void serialize  (  Archive &  ar, 
const uint32_t  version  
) 
Serialize the LSH model.
ar  Archive to serialize to. 
version  serialize class version to provide backward compatibility 
void Train  (  MatType  referenceSet, 
const size_t  numProj,  
const size_t  numTables,  
const double  hashWidth = 0.0 , 

const size_t  secondHashSize = 99901 , 

const size_t  bucketSize = 500 , 

const arma::cube &  projection = arma::cube() 

) 
Train the LSH model on the given dataset.
If a correctlysized projection cube is not provided, this means building new hash tables. Otherwise, we use the projections provided by the user. In order to avoid copying the reference set, consider passing that parameter with std::move().
referenceSet  Set of reference points and the set of queries. 
numProj  Number of projections in each hash table (anything between 1050 might be a decent choice). 
numTables  Total number of hash tables (anything between 1020 should suffice). 
hashWidth  The width of hash for every table. If 0 (the default) is provided, then the hash width is automatically obtained by computing the average pairwise distance of 25 pairs. This should be a reasonable upper bound on the nearestneighbor distance in general. 
secondHashSize  The size of the second hash table. This should be a large prime number. 
bucketSize  The size of the bucket in the second hash table. This is the maximum number of points that can be hashed into single bucket. A value of 0 indicates that there is no limit (so the second hash table can be arbitrarily large—be careful!). 
projection  Cube of projection tables. For a cube of size (a, b, c) we set numProj = a, numTables = c. b is the reference set dimensionality. 
Referenced by LSHSearch< SortPolicy, MatType >::Projections().