InformationGain Class Reference

The standard information gain criterion, used for calculating gain in decision trees. More...

Static Public Member Functions

static double Evaluate (const arma::Mat< size_t > &counts)
 Given the sufficient statistics of a proposed split, calculate the information gain if that split was to be used. More...

 
template<bool UseWeights>
static double Evaluate (const arma::Row< size_t > &labels, const size_t numClasses, const arma::Row< double > &weights)
 Given a set of labels, calculate the information gain of those labels. More...

 
template<bool UseWeights, typename CountType >
static double EvaluatePtr (const CountType *counts, const size_t countLength, const CountType totalCount)
 Evaluate the Gini impurity given a vector of class weight counts. More...

 
static double Range (const size_t numClasses)
 Return the range of the information gain for the given number of classes. More...

 
static double Range (const size_t numClasses)
 Return the range of the information gain for the given number of classes. More...

 

Detailed Description

The standard information gain criterion, used for calculating gain in decision trees.

Definition at line 25 of file information_gain.hpp.

Member Function Documentation

◆ Evaluate() [1/2]

static double Evaluate ( const arma::Mat< size_t > &  counts)
inlinestatic

Given the sufficient statistics of a proposed split, calculate the information gain if that split was to be used.

The 'counts' matrix should contain the number of points in each class in each column, so the size of 'counts' is children x classes, where 'children' is the number of child nodes in the proposed split.

Parameters
countsMatrix of sufficient statistics.

Definition at line 31 of file information_gain.hpp.

◆ Evaluate() [2/2]

static double Evaluate ( const arma::Row< size_t > &  labels,
const size_t  numClasses,
const arma::Row< double > &  weights 
)
inlinestatic

Given a set of labels, calculate the information gain of those labels.

Note that it is possible that due to floating-point representation issues, it is possible that the gain returned can be very slightly greater than 0! Thus, if you are checking for a perfect fit, be sure to use 'gain >= 0.0' not 'gain == 0.0'.

Parameters
labelsLabels of the dataset.
numClassesNumber of classes in the dataset.

Definition at line 59 of file information_gain.hpp.

◆ EvaluatePtr()

static double EvaluatePtr ( const CountType *  counts,
const size_t  countLength,
const CountType  totalCount 
)
inlinestatic

Evaluate the Gini impurity given a vector of class weight counts.

Definition at line 32 of file information_gain.hpp.

◆ Range() [1/2]

static double Range ( const size_t  numClasses)
inlinestatic

Return the range of the information gain for the given number of classes.

(That is, the difference between the maximum possible value and the minimum possible value.)

Definition at line 84 of file information_gain.hpp.

◆ Range() [2/2]

static double Range ( const size_t  numClasses)
inlinestatic

Return the range of the information gain for the given number of classes.

(That is, the difference between the maximum possible value and the minimum possible value.)

Parameters
numClassesNumber of classes in the dataset.

Definition at line 202 of file information_gain.hpp.


The documentation for this class was generated from the following file:
  • /home/jenkins-mlpack/mlpack.org/_src/mlpack-3.2.1/src/mlpack/methods/decision_tree/information_gain.hpp