The HoeffdingTree object represents all of the necessary information for a Hoeffdingboundbased decision tree. More...
Public Types  
typedef CategoricalSplitType< FitnessFunction >  CategoricalSplit 
Allow access to the categorical split type. More...  
typedef NumericSplitType< FitnessFunction >  NumericSplit 
Allow access to the numeric split type. More...  
Public Member Functions  
template < typename MatType >  
HoeffdingTree (const MatType &data, const data::DatasetInfo &datasetInfo, const arma::Row< size_t > &labels, const size_t numClasses, const bool batchTraining=true, const double successProbability=0.95, const size_t maxSamples=0, const size_t checkInterval=100, const size_t minSamples=100, const CategoricalSplitType< FitnessFunction > &categoricalSplitIn=CategoricalSplitType< FitnessFunction >(0, 0), const NumericSplitType< FitnessFunction > &numericSplitIn=NumericSplitType< FitnessFunction >(0))  
Construct the Hoeffding tree with the given parameters and given training data. More...  
HoeffdingTree (const data::DatasetInfo &datasetInfo, const size_t numClasses, const double successProbability=0.95, const size_t maxSamples=0, const size_t checkInterval=100, const size_t minSamples=100, const CategoricalSplitType< FitnessFunction > &categoricalSplitIn=CategoricalSplitType< FitnessFunction >(0, 0), const NumericSplitType< FitnessFunction > &numericSplitIn=NumericSplitType< FitnessFunction >(0), std::unordered_map< size_t, std::pair< size_t, size_t >> *dimensionMappings=NULL, const bool copyDatasetInfo=true)  
Construct the Hoeffding tree with the given parameters, but training on no data. More...  
HoeffdingTree ()  
Construct a Hoeffding tree with no data and no information. More...  
HoeffdingTree (const HoeffdingTree &other)  
Copy another tree (warning: this will duplicate the tree entirely, and may use a lot of memory. More...  
HoeffdingTree (HoeffdingTree &&other)  
Move another tree. More...  
~HoeffdingTree ()  
Clean up memory. More...  
template < typename VecType >  
size_t  CalculateDirection (const VecType &point) const 
Given a point and that this node is not a leaf, calculate the index of the child node this point would go towards. More...  
size_t  CheckInterval () const 
Get the number of samples before a split check is performed. More...  
void  CheckInterval (const size_t checkInterval) 
Modify the number of samples before a split check is performed. More...  
const HoeffdingTree &  Child (const size_t i) const 
Get a child. More...  
HoeffdingTree &  Child (const size_t i) 
Modify a child. More...  
template < typename VecType >  
size_t  Classify (const VecType &point) const 
Classify the given point, using this node and the entire (sub)tree beneath it. More...  
template < typename VecType >  
void  Classify (const VecType &point, size_t &prediction, double &probability) const 
Classify the given point and also return an estimate of the probability that the prediction is correct. More...  
template < typename MatType >  
void  Classify (const MatType &data, arma::Row< size_t > &predictions) const 
Classify the given points, using this node and the entire (sub)tree beneath it. More...  
template < typename MatType >  
void  Classify (const MatType &data, arma::Row< size_t > &predictions, arma::rowvec &probabilities) const 
Classify the given points, using this node and the entire (sub)tree beneath it. More...  
void  CreateChildren () 
Given that this node should split, create the children. More...  
size_t  MajorityClass () const 
Get the majority class. More...  
size_t &  MajorityClass () 
Modify the majority class. More...  
double  MajorityProbability () const 
Get the probability of the majority class (based on training samples). More...  
double &  MajorityProbability () 
Modify the probability of the majority class. More...  
size_t  MaxSamples () const 
Get the maximum number of samples before a split is forced. More...  
void  MaxSamples (const size_t maxSamples) 
Modify the maximum number of samples before a split is forced. More...  
size_t  MinSamples () const 
Get the minimum number of samples for a split. More...  
void  MinSamples (const size_t minSamples) 
Modify the minimum number of samples for a split. More...  
size_t  NumChildren () const 
Get the number of children. More...  
size_t  NumDescendants () const 
Get the size of the Hoeffding Tree. More...  
HoeffdingTree &  operator= (const HoeffdingTree &other) 
Copy assignment operator. More...  
HoeffdingTree &  operator= (HoeffdingTree &&other) 
Move assignment operator. More...  
template < typename Archive >  
void  serialize (Archive &ar, const uint32_t) 
Serialize the split. More...  
size_t  SplitCheck () 
Check if a split would satisfy the conditions of the Hoeffding bound with the node's specified success probability. More...  
size_t  SplitDimension () const 
Get the splitting dimension (size_t(1) if no split). More...  
double  SuccessProbability () const 
Get the confidence required for a split. More...  
void  SuccessProbability (const double successProbability) 
Modify the confidence required for a split. More...  
template < typename MatType >  
void  Train (const MatType &data, const arma::Row< size_t > &labels, const bool batchTraining=true, const bool resetTree=false, const size_t numClasses=0) 
Train on a set of points, either in streaming mode or in batch mode, with the given labels. More...  
template < typename MatType >  
void  Train (const MatType &data, const data::DatasetInfo &info, const arma::Row< size_t > &labels, const bool batchTraining=true, const size_t numClasses=0) 
Train on a set of points, either in streaming mode or in batch mode, with the given labels and the given DatasetInfo . More...  
template < typename VecType >  
void  Train (const VecType &point, const size_t label) 
Train on a single point in streaming mode, with the given label. More...  
The HoeffdingTree object represents all of the necessary information for a Hoeffdingboundbased decision tree.
This class is able to train on samples in streaming settings and batch settings, and perform splits based on the Hoeffding bound. The Hoeffding tree (also known as the "very fast decision tree" – VFDT) is described in the following paper:
The class is modular, and takes three template parameters. The first, FitnessFunction, is the fitness function that should be used to determine whether a split is beneficial; examples might be GiniImpurity or HoeffdingInformationGain. The NumericSplitType determines how numeric attributes are handled, and the CategoricalSplitType determines how categorical attributes are handled. As far as the actual splitting goes, the meat of the splitting procedure will be contained in those two classes.
FitnessFunction  Fitness function to use. 
NumericSplitType  Technique for splitting numeric features. 
CategoricalSplitType  Technique for splitting categorical features. 
Definition at line 61 of file hoeffding_tree.hpp.
typedef CategoricalSplitType<FitnessFunction> CategoricalSplit 
Allow access to the categorical split type.
Definition at line 67 of file hoeffding_tree.hpp.
typedef NumericSplitType<FitnessFunction> NumericSplit 
Allow access to the numeric split type.
Definition at line 65 of file hoeffding_tree.hpp.
HoeffdingTree  (  const MatType &  data, 
const data::DatasetInfo &  datasetInfo,  
const arma::Row< size_t > &  labels,  
const size_t  numClasses,  
const bool  batchTraining = true , 

const double  successProbability = 0.95 , 

const size_t  maxSamples = 0 , 

const size_t  checkInterval = 100 , 

const size_t  minSamples = 100 , 

const CategoricalSplitType< FitnessFunction > &  categoricalSplitIn = CategoricalSplitType< FitnessFunction >(0, 0) , 

const NumericSplitType< FitnessFunction > &  numericSplitIn = NumericSplitType< FitnessFunction >(0) 

) 
Construct the Hoeffding tree with the given parameters and given training data.
The tree may be trained either in batch mode (which looks at all points before splitting, and propagates these points to the created children for further training), or in streaming mode, where each point is only considered once. (In general, batch mode will give betterperforming trees, but will have higher memory and runtime costs for the same dataset.)
data  Dataset to train on. 
datasetInfo  Information on the dataset (types of each feature). 
labels  Labels of each point in the dataset. 
numClasses  Number of classes in the dataset. 
batchTraining  Whether or not to train in batch. 
successProbability  Probability of success required in Hoeffding bounds before a split can happen. 
maxSamples  Maximum number of samples before a split is forced (0 never forces a split); ignored in batch training mode. 
checkInterval  Number of samples required before each split; ignored in batch training mode. 
minSamples  If the node has seen this many points or fewer, no split will be allowed. 
categoricalSplitIn  Optional instantiated categorical split object. 
numericSplitIn  Optional instantiated numeric split object. 
HoeffdingTree  (  const data::DatasetInfo &  datasetInfo, 
const size_t  numClasses,  
const double  successProbability = 0.95 , 

const size_t  maxSamples = 0 , 

const size_t  checkInterval = 100 , 

const size_t  minSamples = 100 , 

const CategoricalSplitType< FitnessFunction > &  categoricalSplitIn = CategoricalSplitType< FitnessFunction >(0, 0) , 

const NumericSplitType< FitnessFunction > &  numericSplitIn = NumericSplitType< FitnessFunction >(0) , 

std::unordered_map< size_t, std::pair< size_t, size_t >> *  dimensionMappings = NULL , 

const bool  copyDatasetInfo = true 

) 
Construct the Hoeffding tree with the given parameters, but training on no data.
The dimensionMappings parameter is only used if it is desired that this node does not create its own dimensionMappings object (for instance, if this is a child of another node in the tree).
numClasses  Number of classes in the dataset. 
datasetInfo  Information on the dataset (types of each feature). 
successProbability  Probability of success required in Hoeffding bound before a split can happen. 
maxSamples  Maximum number of samples before a split is forced. 
checkInterval  Number of samples required before each split check. 
minSamples  If the node has seen this many points or fewer, no split will be allowed. 
dimensionMappings  Mappings from dimension indices to positions in numeric and categorical split vectors. If left NULL, a new one will be created. 
copyDatasetInfo  If true, then a copy of the datasetInfo will be made. 
categoricalSplitIn  Optional instantiated categorical split object. 
numericSplitIn  Optional instantiated numeric split object. 
HoeffdingTree  (  ) 
Construct a Hoeffding tree with no data and no information.
Be sure to call Train() before trying to use the tree.
HoeffdingTree  (  const HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType > &  other  ) 
Copy another tree (warning: this will duplicate the tree entirely, and may use a lot of memory.
Make sure it's what you want before you do it).
other  Tree to copy. 
HoeffdingTree  (  HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType > &&  other  ) 
Move another tree.
other  Tree to move. 
~HoeffdingTree  (  ) 
Clean up memory.
size_t CalculateDirection  (  const VecType &  point  )  const 
Given a point and that this node is not a leaf, calculate the index of the child node this point would go towards.
This method is primarily used by the Classify() function, but it can be used in a standalone sense too.
point  Point to classify. 
Referenced by HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CheckInterval().

inline 
Get the number of samples before a split check is performed.
Definition at line 284 of file hoeffding_tree.hpp.
References HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CalculateDirection(), HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::Classify(), HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CreateChildren(), HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::NumDescendants(), and HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::serialize().
void CheckInterval  (  const size_t  checkInterval  ) 
Modify the number of samples before a split check is performed.

inline 
Get a child.
Definition at line 264 of file hoeffding_tree.hpp.

inline 
Modify a child.
Definition at line 266 of file hoeffding_tree.hpp.
size_t Classify  (  const VecType &  point  )  const 
Classify the given point, using this node and the entire (sub)tree beneath it.
The predicted label is returned.
point  Point to classify. 
Referenced by HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CheckInterval().
void Classify  (  const VecType &  point, 
size_t &  prediction,  
double &  probability  
)  const 
Classify the given point and also return an estimate of the probability that the prediction is correct.
(This estimate is simply the probability that a training point was from the majority class in the leaf that this point binned to.)
point  Point to classify. 
prediction  Predicted label of point. 
probability  An estimate of the probability that the prediction is correct. 
void Classify  (  const MatType &  data, 
arma::Row< size_t > &  predictions  
)  const 
Classify the given points, using this node and the entire (sub)tree beneath it.
The predicted labels for each point are returned.
data  Points to classify. 
predictions  Predicted labels for each point. 
void Classify  (  const MatType &  data, 
arma::Row< size_t > &  predictions,  
arma::rowvec &  probabilities  
)  const 
Classify the given points, using this node and the entire (sub)tree beneath it.
The predicted labels for each point are returned, as well as an estimate of the probability that the prediction is correct for each point. This estimate is simply the MajorityProbability() for the leaf that each point bins to.
data  Points to classify. 
predictions  Predicted labels for each point. 
probabilities  Probability estimates for each predicted label. 
void CreateChildren  (  ) 
Given that this node should split, create the children.
Referenced by HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CheckInterval().

inline 
Get the majority class.
Definition at line 251 of file hoeffding_tree.hpp.

inline 
Modify the majority class.
Definition at line 253 of file hoeffding_tree.hpp.

inline 
Get the probability of the majority class (based on training samples).
Definition at line 256 of file hoeffding_tree.hpp.

inline 
Modify the probability of the majority class.
Definition at line 258 of file hoeffding_tree.hpp.

inline 
Get the maximum number of samples before a split is forced.
Definition at line 279 of file hoeffding_tree.hpp.
void MaxSamples  (  const size_t  maxSamples  ) 
Modify the maximum number of samples before a split is forced.

inline 
Get the minimum number of samples for a split.
Definition at line 274 of file hoeffding_tree.hpp.
void MinSamples  (  const size_t  minSamples  ) 
Modify the minimum number of samples for a split.

inline 
Get the number of children.
Definition at line 261 of file hoeffding_tree.hpp.
size_t NumDescendants  (  )  const 
Get the size of the Hoeffding Tree.
Referenced by HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CheckInterval().
HoeffdingTree& operator=  (  const HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType > &  other  ) 
Copy assignment operator.
other  Tree to copy. 
HoeffdingTree& operator=  (  HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType > &&  other  ) 
Move assignment operator.
other  Tree to move. 
void serialize  (  Archive &  ar, 
const uint32_t  
) 
Serialize the split.
Referenced by HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CheckInterval().
size_t SplitCheck  (  ) 
Check if a split would satisfy the conditions of the Hoeffding bound with the node's specified success probability.
If so, the number of children that would be created is returned. If not, 0 is returned.

inline 
Get the splitting dimension (size_t(1) if no split).
Definition at line 248 of file hoeffding_tree.hpp.

inline 
Get the confidence required for a split.
Definition at line 269 of file hoeffding_tree.hpp.
void SuccessProbability  (  const double  successProbability  ) 
Modify the confidence required for a split.
void Train  (  const MatType &  data, 
const arma::Row< size_t > &  labels,  
const bool  batchTraining = true , 

const bool  resetTree = false , 

const size_t  numClasses = 0 

) 
Train on a set of points, either in streaming mode or in batch mode, with the given labels.
If resetTree
is set to true
, then reset the state of the tree to an empty tree before training.
Note that the tree will be automatically reset if the dimensionality of data
does not match the dimensionality that the tree was currently trained with. The tree will also be reset if numClasses
is passed.
data  Data points to train on. 
labels  Labels of data points. 
batchTraining  If true, perform training in batch. 
resetTree  If true, reset the tree to an empty tree before training. 
numClasses  The number of classes in labels . Passing this will reset the tree. If not given and resetTree is true , then the number of classes will be computed from labels . 
void Train  (  const MatType &  data, 
const data::DatasetInfo &  info,  
const arma::Row< size_t > &  labels,  
const bool  batchTraining = true , 

const size_t  numClasses = 0 

) 
Train on a set of points, either in streaming mode or in batch mode, with the given labels and the given DatasetInfo
.
This will reset the tree. This only needs to be called when the DatasetInfo
has changed—if you are training incrementally but have already passed the DatasetInfo once, use the overload of Train()
that does not take a DatasetInfo
and make sure resetTree
is set to false
.
data  Data points to train on. 
info  DatasetInfo object with information about each dimension. 
labels  Labels of data points. 
batchTraining  If true, perform training in batch. 
numClasses  Number of classes in labels . If not specified, it is computed from labels . 
void Train  (  const VecType &  point, 
const size_t  label  
) 
Train on a single point in streaming mode, with the given label.
The tree will not be reset before training.
point  Point to train on. 
label  Label of point to train on. 