The standard information gain criterion, used for calculating gain in decision trees. More...
Static Public Member Functions | |
static double | Evaluate (const arma::Mat< size_t > &counts) |
Given the sufficient statistics of a proposed split, calculate the information gain if that split was to be used. More... | |
template<bool UseWeights> | |
static double | Evaluate (const arma::Row< size_t > &labels, const size_t numClasses, const arma::Row< double > &weights) |
Given a set of labels, calculate the information gain of those labels. More... | |
template<bool UseWeights, typename CountType > | |
static double | EvaluatePtr (const CountType *counts, const size_t countLength, const CountType totalCount) |
Evaluate the Gini impurity given a vector of class weight counts. More... | |
static double | Range (const size_t numClasses) |
Return the range of the information gain for the given number of classes. More... | |
static double | Range (const size_t numClasses) |
Return the range of the information gain for the given number of classes. More... | |
The standard information gain criterion, used for calculating gain in decision trees.
Definition at line 25 of file information_gain.hpp.
|
inlinestatic |
Given the sufficient statistics of a proposed split, calculate the information gain if that split was to be used.
The 'counts' matrix should contain the number of points in each class in each column, so the size of 'counts' is children x classes, where 'children' is the number of child nodes in the proposed split.
counts | Matrix of sufficient statistics. |
Definition at line 31 of file information_gain.hpp.
|
inlinestatic |
Given a set of labels, calculate the information gain of those labels.
Note that it is possible that due to floating-point representation issues, it is possible that the gain returned can be very slightly greater than 0! Thus, if you are checking for a perfect fit, be sure to use 'gain >= 0.0' not 'gain == 0.0'.
labels | Labels of the dataset. |
numClasses | Number of classes in the dataset. |
Definition at line 59 of file information_gain.hpp.
|
inlinestatic |
Evaluate the Gini impurity given a vector of class weight counts.
Definition at line 32 of file information_gain.hpp.
|
inlinestatic |
Return the range of the information gain for the given number of classes.
(That is, the difference between the maximum possible value and the minimum possible value.)
Definition at line 84 of file information_gain.hpp.
|
inlinestatic |
Return the range of the information gain for the given number of classes.
(That is, the difference between the maximum possible value and the minimum possible value.)
numClasses | Number of classes in the dataset. |
Definition at line 202 of file information_gain.hpp.