mlpack: a scalable c++ machine learning library
mlpack  2.2.2

A Gaussian Mixture Model (GMM). More...

Public Member Functions

 GMM ()
 Create an empty Gaussian Mixture Model, with zero gaussians. More...

 
 GMM (const size_t gaussians, const size_t dimensionality)
 Create a GMM with the given number of Gaussians, each of which have the specified dimensionality. More...

 
 GMM (const std::vector< distribution::GaussianDistribution > &dists, const arma::vec &weights)
 Create a GMM with the given dists and weights. More...

 
 GMM (const GMM &other)
 Copy constructor for GMMs. More...

 
void Classify (const arma::mat &observations, arma::Row< size_t > &labels) const
 Classify the given observations as being from an individual component in this GMM. More...

 
const distribution::GaussianDistributionComponent (size_t i) const
 Return a const reference to a component distribution. More...

 
distribution::GaussianDistributionComponent (size_t i)
 Return a reference to a component distribution. More...

 
size_t Dimensionality () const
 Return the dimensionality of the model. More...

 
size_t Gaussians () const
 Return the number of gaussians in the model. More...

 
GMMoperator= (const GMM &other)
 Copy operator for GMMs. More...

 
double Probability (const arma::vec &observation) const
 Return the probability that the given observation came from this distribution. More...

 
double Probability (const arma::vec &observation, const size_t component) const
 Return the probability that the given observation came from the given Gaussian component in this distribution. More...

 
arma::vec Random () const
 Return a randomly generated observation according to the probability distribution defined by this object. More...

 
template
<
typename
Archive
>
void Serialize (Archive &ar, const unsigned int)
 Serialize the GMM. More...

 
template<typename FittingType = EMFit<>>
double Train (const arma::mat &observations, const size_t trials=1, const bool useExistingModel=false, FittingType fitter=FittingType())
 Estimate the probability distribution directly from the given observations, using the given algorithm in the FittingType class to fit the data. More...

 
template<typename FittingType = EMFit<>>
double Train (const arma::mat &observations, const arma::vec &probabilities, const size_t trials=1, const bool useExistingModel=false, FittingType fitter=FittingType())
 Estimate the probability distribution directly from the given observations, taking into account the probability of each observation actually being from this distribution, and using the given algorithm in the FittingType class to fit the data. More...

 
const arma::vec & Weights () const
 Return a const reference to the a priori weights of each Gaussian. More...

 
arma::vec & Weights ()
 Return a reference to the a priori weights of each Gaussian. More...

 

Private Member Functions

double LogLikelihood (const arma::mat &dataPoints, const std::vector< distribution::GaussianDistribution > &distsL, const arma::vec &weights) const
 This function computes the loglikelihood of the given model. More...

 

Private Attributes

size_t dimensionality
 The dimensionality of the model. More...

 
std::vector< distribution::GaussianDistributiondists
 Vector of Gaussians. More...

 
size_t gaussians
 The number of Gaussians in the model. More...

 
arma::vec weights
 Vector of a priori weights for each Gaussian. More...

 

Detailed Description

A Gaussian Mixture Model (GMM).

This class uses maximum likelihood loss functions to estimate the parameters of the GMM on a given dataset via the given fitting mechanism, defined by the FittingType template parameter. The GMM can be trained using normal data, or data with probabilities of being from this GMM (see GMM::Train() for more information).

The Train() method uses a template type 'FittingType'. The FittingType template class must provide a way for the GMM to train on data. It must provide the following two functions:

void Estimate(const arma::mat& observations,
std::vector<distribution::GaussianDistribution>& dists,
arma::vec& weights);
void Estimate(const arma::mat& observations,
const arma::vec& probabilities,
std::vector<distribution::GaussianDistribution>& dists,
arma::vec& weights);

These functions should produce a trained GMM from the given observations and probabilities. These may modify the size of the model (by increasing the size of the mean and covariance vectors as well as the weight vectors), but the method should expect that these vectors are already set to the size of the GMM as specified in the constructor.

For a sample implementation, see the EMFit class; this class uses the EM algorithm to train a GMM, and is the default fitting type for the Train() method.

The GMM, once trained, can be used to generate random points from the distribution and estimate the probability of points being from the distribution. The parameters of the GMM can be obtained through the accessors and mutators.

Example use:

// Set up a mixture of 5 gaussians in a 4-dimensional space.
GMM g(5, 4);
// Train the GMM given the data observations, using the default EM fitting
// mechanism.
g.Train(data);
// Get the probability of 'observation' being observed from this GMM.
double probability = g.Probability(observation);
// Get a random observation from the GMM.
arma::vec observation = g.Random();

Definition at line 79 of file gmm.hpp.

Constructor & Destructor Documentation

◆ GMM() [1/4]

mlpack::gmm::GMM::GMM ( )
inline

Create an empty Gaussian Mixture Model, with zero gaussians.

Definition at line 97 of file gmm.hpp.

References mlpack::Log::Debug.

Referenced by GMM().

◆ GMM() [2/4]

mlpack::gmm::GMM::GMM ( const size_t  gaussians,
const size_t  dimensionality 
)

Create a GMM with the given number of Gaussians, each of which have the specified dimensionality.

The means and covariances will be set to 0.

Parameters
gaussiansNumber of Gaussians in this GMM.
dimensionalityDimensionality of each Gaussian.

◆ GMM() [3/4]

mlpack::gmm::GMM::GMM ( const std::vector< distribution::GaussianDistribution > &  dists,
const arma::vec &  weights 
)
inline

Create a GMM with the given dists and weights.

Parameters
distsDistributions of the model.
weightsWeights of the model.

Definition at line 123 of file gmm.hpp.

References GMM(), and operator=().

◆ GMM() [4/4]

mlpack::gmm::GMM::GMM ( const GMM other)

Copy constructor for GMMs.

Member Function Documentation

◆ Classify()

void mlpack::gmm::GMM::Classify ( const arma::mat &  observations,
arma::Row< size_t > &  labels 
) const

Classify the given observations as being from an individual component in this GMM.

The resultant classifications are stored in the 'labels' object, and each label will be between 0 and (Gaussians() - 1). Supposing that a point was classified with label 2, and that our GMM object was called 'gmm', one could access the relevant Gaussian distribution as follows:

arma::vec mean = gmm.Means()[2];
arma::mat covariance = gmm.Covariances()[2];
double priorWeight = gmm.Weights()[2];
Parameters
observationsList of observations to classify.
labelsObject which will be filled with labels.

Referenced by Weights().

◆ Component() [1/2]

const distribution::GaussianDistribution& mlpack::gmm::GMM::Component ( size_t  i) const
inline

Return a const reference to a component distribution.

Parameters
iindex of component.

Definition at line 146 of file gmm.hpp.

◆ Component() [2/2]

distribution::GaussianDistribution& mlpack::gmm::GMM::Component ( size_t  i)
inline

Return a reference to a component distribution.

Parameters
iindex of component.

Definition at line 153 of file gmm.hpp.

◆ Dimensionality()

size_t mlpack::gmm::GMM::Dimensionality ( ) const
inline

Return the dimensionality of the model.

Definition at line 139 of file gmm.hpp.

References dimensionality.

◆ Gaussians()

size_t mlpack::gmm::GMM::Gaussians ( ) const
inline

Return the number of gaussians in the model.

Definition at line 137 of file gmm.hpp.

References gaussians.

◆ LogLikelihood()

double mlpack::gmm::GMM::LogLikelihood ( const arma::mat &  dataPoints,
const std::vector< distribution::GaussianDistribution > &  distsL,
const arma::vec &  weights 
) const
private

This function computes the loglikelihood of the given model.

This function is used by GMM::Train().

Parameters
dataPointsObservations to calculate the likelihood for.
meansMeans of the given mixture model.
covarsCovariances of the given mixture model.
weightsWeights of the given mixture model.

Referenced by Weights().

◆ operator=()

GMM& mlpack::gmm::GMM::operator= ( const GMM other)

Copy operator for GMMs.

Referenced by GMM().

◆ Probability() [1/2]

double mlpack::gmm::GMM::Probability ( const arma::vec &  observation) const

Return the probability that the given observation came from this distribution.

Parameters
observationObservation to evaluate the probability of.

Referenced by Weights().

◆ Probability() [2/2]

double mlpack::gmm::GMM::Probability ( const arma::vec &  observation,
const size_t  component 
) const

Return the probability that the given observation came from the given Gaussian component in this distribution.

Parameters
observationObservation to evaluate the probability of.
componentIndex of the component of the GMM to be considered.

◆ Random()

arma::vec mlpack::gmm::GMM::Random ( ) const

Return a randomly generated observation according to the probability distribution defined by this object.

Returns
Random observation from this GMM.

Referenced by Weights().

◆ Serialize()

template
<
typename
Archive
>
void mlpack::gmm::GMM::Serialize ( Archive &  ar,
const unsigned  int 
)

Serialize the GMM.

Referenced by Weights().

◆ Train() [1/2]

template<typename FittingType = EMFit<>>
double mlpack::gmm::GMM::Train ( const arma::mat &  observations,
const size_t  trials = 1,
const bool  useExistingModel = false,
FittingType  fitter = FittingType() 
)

Estimate the probability distribution directly from the given observations, using the given algorithm in the FittingType class to fit the data.

The fitting will be performed 'trials' times; from these trials, the model with the greatest log-likelihood will be selected. By default, only one trial is performed. The log-likelihood of the best fitting is returned.

Optionally, the existing model can be used as an initial model for the estimation by setting 'useExistingModel' to true. If the fitting procedure is deterministic after the initial position is given, then 'trials' should be set to 1.

Template Parameters
FittingTypeThe type of fitting method which should be used (EMFit<> is suggested).
Parameters
observationsObservations of the model.
trialsNumber of trials to perform; the model in these trials with the greatest log-likelihood will be selected.
useExistingModelIf true, the existing model is used as an initial model for the estimation.
Returns
The log-likelihood of the best fit.

Referenced by Weights().

◆ Train() [2/2]

template<typename FittingType = EMFit<>>
double mlpack::gmm::GMM::Train ( const arma::mat &  observations,
const arma::vec &  probabilities,
const size_t  trials = 1,
const bool  useExistingModel = false,
FittingType  fitter = FittingType() 
)

Estimate the probability distribution directly from the given observations, taking into account the probability of each observation actually being from this distribution, and using the given algorithm in the FittingType class to fit the data.

The fitting will be performed 'trials' times; from these trials, the model with the greatest log-likelihood will be selected. By default, only one trial is performed. The log-likelihood of the best fitting is returned.

Optionally, the existing model can be used as an initial model for the estimation by setting 'useExistingModel' to true. If the fitting procedure is deterministic after the initial position is given, then 'trials' should be set to 1.

Parameters
observationsObservations of the model.
probabilitiesProbability of each observation being from this distribution.
trialsNumber of trials to perform; the model in these trials with the greatest log-likelihood will be selected.
useExistingModelIf true, the existing model is used as an initial model for the estimation.
Returns
The log-likelihood of the best fit.

◆ Weights() [1/2]

const arma::vec& mlpack::gmm::GMM::Weights ( ) const
inline

Return a const reference to the a priori weights of each Gaussian.

Definition at line 156 of file gmm.hpp.

References weights.

◆ Weights() [2/2]

arma::vec& mlpack::gmm::GMM::Weights ( )
inline

Return a reference to the a priori weights of each Gaussian.

Definition at line 158 of file gmm.hpp.

References Classify(), LogLikelihood(), Probability(), Random(), Serialize(), Train(), and weights.

Member Data Documentation

◆ dimensionality

size_t mlpack::gmm::GMM::dimensionality
private

The dimensionality of the model.

Definition at line 85 of file gmm.hpp.

Referenced by Dimensionality().

◆ dists

std::vector<distribution::GaussianDistribution> mlpack::gmm::GMM::dists
private

Vector of Gaussians.

Definition at line 88 of file gmm.hpp.

◆ gaussians

size_t mlpack::gmm::GMM::gaussians
private

The number of Gaussians in the model.

Definition at line 83 of file gmm.hpp.

Referenced by Gaussians().

◆ weights

arma::vec mlpack::gmm::GMM::weights
private

Vector of a priori weights for each Gaussian.

Definition at line 91 of file gmm.hpp.

Referenced by Weights().


The documentation for this class was generated from the following file: